Model parameters: d_model 1792 ffw_size 7168 kv_size 128 n_heads 14 n_layers 26 Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 26 --hidden-size 1792 --num-attention-heads 14 --kv-channels 128 --ffn-hidden-size 7168 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 1 --global-batch-size 256 --train-samples 122_070_313 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-1b1250b1b5 --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 122_070_313 --lr-warmup-samples 1_220_703 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 100 --save-interval 20000 --eval-interval 10000 --eval-iters 1 --tensorboard-dir tensorboard_1b1250b1b5 --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_1b1250b1b5 --load checkpoints_1b1250b1b5 --train-weighted-split-paths-path train1b5.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3420535.json --zero-stage 0 START 3420535: Wed 26 Apr 2023 11:59:42 PM EEST 0: 0: 0: ======================= ROCm System Management Interface ======================= 0: ================================= Concise Info ================================= 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0: 0 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 2 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 4 39.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 6 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: ================================================================================ 0: ============================= End of ROCm SMI Log ============================== 5: 5: 5: ======================= ROCm System Management Interface ======================= 5: ================================= Concise Info ================================= 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 5: 0 45.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 2 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 4 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 5 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 6 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: ================================================================================ 5: ============================= End of ROCm SMI Log ============================== 7: 7: 7: ======================= ROCm System Management Interface ======================= 7: ================================= Concise Info ================================= 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 7: 0 44.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 2 39.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 4 39.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 6 37.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: ================================================================================ 7: ============================= End of ROCm SMI Log ============================== 15: 15: 15: ======================= ROCm System Management Interface ======================= 15: ================================= Concise Info ================================= 15: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 15: 0 46.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: 2 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: 4 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: 6 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 15: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 15: ================================================================================ 15: ============================= End of ROCm SMI Log ============================== 10: 10: 10: ======================= ROCm System Management Interface ======================= 10: ================================= Concise Info ================================= 10: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 10: 0 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: 2 37.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: 4 46.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: 6 37.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 10: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 10: ================================================================================ 10: ============================= End of ROCm SMI Log ============================== 13: 13: 13: ======================= ROCm System Management Interface ======================= 13: ================================= Concise Info ================================= 13: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 13: 0 47.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: 2 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: 4 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: 6 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 13: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 13: ================================================================================ 13: ============================= End of ROCm SMI Log ============================== 9: 9: 9: ======================= ROCm System Management Interface ======================= 9: ================================= Concise Info ================================= 9: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 9: 0 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: 2 44.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: 4 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: 6 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 9: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 9: ================================================================================ 9: ============================= End of ROCm SMI Log ============================== 17: 17: 17: ======================= ROCm System Management Interface ======================= 17: ================================= Concise Info ================================= 17: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 17: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: 2 46.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: 4 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: 6 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 17: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 17: ================================================================================ 17: ============================= End of ROCm SMI Log ============================== 27: 27: 27: ======================= ROCm System Management Interface ======================= 27: ================================= Concise Info ================================= 27: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 27: 0 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: 2 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: 4 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: 6 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 27: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 27: ================================================================================ 27: ============================= End of ROCm SMI Log ============================== 28: 28: 28: ======================= ROCm System Management Interface ======================= 28: ================================= Concise Info ================================= 28: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 28: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: 2 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: 4 45.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: 6 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 28: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 28: ================================================================================ 28: ============================= End of ROCm SMI Log ============================== 23: 23: 23: ======================= ROCm System Management Interface ======================= 23: ================================= Concise Info ================================= 23: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 23: 0 43.0c 98.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 3 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: 4 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: 6 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 23: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 23: ================================================================================ 23: ============================= End of ROCm SMI Log ============================== 30: 30: 30: ======================= ROCm System Management Interface ======================= 30: ================================= Concise Info ================================= 30: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 30: 0 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: 2 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: 4 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: 6 39.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 30: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 30: ================================================================================ 30: ============================= End of ROCm SMI Log ============================== 16: 16: 16: ======================= ROCm System Management Interface ======================= 16: ================================= Concise Info ================================= 16: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 16: 0 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: 2 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: 4 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: 6 37.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 16: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 16: ================================================================================ 16: ============================= End of ROCm SMI Log ============================== 4: 4: 4: ======================= ROCm System Management Interface ======================= 4: ================================= Concise Info ================================= 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 4: 0 45.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 2 36.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 4 46.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 6 34.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: ================================================================================ 4: ============================= End of ROCm SMI Log ============================== 12: 12: 12: ======================= ROCm System Management Interface ======================= 12: ================================= Concise Info ================================= 12: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 12: 0 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: 2 38.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: 4 45.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: 6 40.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 12: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 12: ================================================================================ 12: ============================= End of ROCm SMI Log ============================== 20: 20: 20: ======================= ROCm System Management Interface ======================= 20: ================================= Concise Info ================================= 20: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 20: 0 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: 2 37.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: 4 48.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: 6 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 20: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 20: ================================================================================ 20: ============================= End of ROCm SMI Log ============================== 29: 29: 29: ======================= ROCm System Management Interface ======================= 29: ================================= Concise Info ================================= 29: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 29: 0 46.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: 2 40.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: 4 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: 6 40.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 29: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 29: ================================================================================ 29: ============================= End of ROCm SMI Log ============================== 24: 24: 24: ======================= ROCm System Management Interface ======================= 24: ================================= Concise Info ================================= 24: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 24: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: 2 37.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: 4 43.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: 6 43.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 24: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 24: ================================================================================ 24: ============================= End of ROCm SMI Log ============================== 18: 18: 18: ======================= ROCm System Management Interface ======================= 18: ================================= Concise Info ================================= 18: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 18: 0 40.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 1 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: 2 44.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: 4 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: 6 34.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 18: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 18: ================================================================================ 18: ============================= End of ROCm SMI Log ============================== 25: 25: 25: ======================= ROCm System Management Interface ======================= 25: ================================= Concise Info ================================= 25: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 25: 0 47.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: 2 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: 4 38.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: 6 35.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 25: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 25: ================================================================================ 25: ============================= End of ROCm SMI Log ============================== 21: 21: 21: ======================= ROCm System Management Interface ======================= 21: ================================= Concise Info ================================= 21: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 21: 0 47.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: 2 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 3 38.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: 4 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: 6 39.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 21: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 21: ================================================================================ 21: ============================= End of ROCm SMI Log ============================== 26: 26: 26: ======================= ROCm System Management Interface ======================= 26: ================================= Concise Info ================================= 26: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 26: 0 42.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 1 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: 2 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: 4 37.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: 6 43.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 26: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 26: ================================================================================ 26: ============================= End of ROCm SMI Log ============================== 31: 31: 31: ======================= ROCm System Management Interface ======================= 31: ================================= Concise Info ================================= 31: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 31: 0 50.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: 2 40.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: 4 45.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: 6 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 31: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 31: ================================================================================ 31: ============================= End of ROCm SMI Log ============================== 6: 6: 6: ======================= ROCm System Management Interface ======================= 6: ================================= Concise Info ================================= 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 6: 0 45.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 1 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 2 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 4 44.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 6 40.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: ================================================================================ 6: ============================= End of ROCm SMI Log ============================== 1: 1: 1: ======================= ROCm System Management Interface ======================= 1: ================================= Concise Info ================================= 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 1: 0 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 1 51.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 2 41.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 4 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 6 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: ================================================================================ 1: ============================= End of ROCm SMI Log ============================== 8: 8: 8: ======================= ROCm System Management Interface ======================= 8: ================================= Concise Info ================================= 8: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 8: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: 2 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: 4 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: 6 36.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 8: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 8: ================================================================================ 8: ============================= End of ROCm SMI Log ============================== 14: 14: 14: ======================= ROCm System Management Interface ======================= 14: ================================= Concise Info ================================= 14: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 14: 0 44.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: 2 41.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: 4 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: 6 48.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 14: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 14: ================================================================================ 14: ============================= End of ROCm SMI Log ============================== 19: 19: 19: ======================= ROCm System Management Interface ======================= 19: ================================= Concise Info ================================= 19: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 19: 0 44.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 1 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: 2 41.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: 4 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: 6 38.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 19: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 19: ================================================================================ 19: ============================= End of ROCm SMI Log ============================== 11: 11: 11: ======================= ROCm System Management Interface ======================= 11: ================================= Concise Info ================================= 11: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 11: 0 48.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 1 54.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: 2 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: 4 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: 6 39.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 11: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 11: ================================================================================ 11: ============================= End of ROCm SMI Log ============================== 22: 22: 22: ======================= ROCm System Management Interface ======================= 22: ================================= Concise Info ================================= 22: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 22: 0 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: 2 38.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 3 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: 4 41.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: 6 41.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 22: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 22: ================================================================================ 22: ============================= End of ROCm SMI Log ============================== 2: 2: 2: ======================= ROCm System Management Interface ======================= 2: ================================= Concise Info ================================= 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2: 0 43.0c 102.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 2 39.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 3 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 4 44.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 6 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: ================================================================================ 2: ============================= End of ROCm SMI Log ============================== 3: 3: 3: ======================= ROCm System Management Interface ======================= 3: ================================= Concise Info ================================= 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 3: 0 45.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 2 41.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 4 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 6 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 7 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: ================================================================================ 3: ============================= End of ROCm SMI Log ============================== 0: Launching on nid005142 (0/32), master nid005142 port 9999, GPUs 8, CUDA: True 16: Launching on nid006128 (16/32), master nid005142 port 9999, GPUs 8, CUDA: True 28: Launching on nid006140 (28/32), master nid005142 port 9999, GPUs 8, CUDA: True 5: Launching on nid005147 (5/32), master nid005142 port 9999, GPUs 8, CUDA: True 7: Launching on nid005149 (7/32), master nid005142 port 9999, GPUs 8, CUDA: True 24: Launching on nid006136 (24/32), master nid005142 port 9999, GPUs 8, CUDA: True 13: Launching on nid006125 (13/32), master nid005142 port 9999, GPUs 8, CUDA: True 17: Launching on nid006129 (17/32), master nid005142 port 9999, GPUs 8, CUDA: True 23: Launching on nid006135 (23/32), master nid005142 port 9999, GPUs 8, CUDA: True 4: Launching on nid005146 (4/32), master nid005142 port 9999, GPUs 8, CUDA: True 9: Launching on nid005151 (9/32), master nid005142 port 9999, GPUs 8, CUDA: True 30: Launching on nid006142 (30/32), master nid005142 port 9999, GPUs 8, CUDA: True 15: Launching on nid006127 (15/32), master nid005142 port 9999, GPUs 8, CUDA: True 21: Launching on nid006133 (21/32), master nid005142 port 9999, GPUs 8, CUDA: True 10: Launching on nid006122 (10/32), master nid005142 port 9999, GPUs 8, CUDA: True 6: Launching on nid005148 (6/32), master nid005142 port 9999, GPUs 8, CUDA: True 20: Launching on nid006132 (20/32), master nid005142 port 9999, GPUs 8, CUDA: True 12: Launching on nid006124 (12/32), master nid005142 port 9999, GPUs 8, CUDA: True 1: Launching on nid005143 (1/32), master nid005142 port 9999, GPUs 8, CUDA: True 26: Launching on nid006138 (26/32), master nid005142 port 9999, GPUs 8, CUDA: True 31: Launching on nid006143 (31/32), master nid005142 port 9999, GPUs 8, CUDA: True 29: Launching on nid006141 (29/32), master nid005142 port 9999, GPUs 8, CUDA: True 27: Launching on nid006139 (27/32), master nid005142 port 9999, GPUs 8, CUDA: True 18: Launching on nid006130 (18/32), master nid005142 port 9999, GPUs 8, CUDA: True 14: Launching on nid006126 (14/32), master nid005142 port 9999, GPUs 8, CUDA: True 25: Launching on nid006137 (25/32), master nid005142 port 9999, GPUs 8, CUDA: True 19: Launching on nid006131 (19/32), master nid005142 port 9999, GPUs 8, CUDA: True 8: Launching on nid005150 (8/32), master nid005142 port 9999, GPUs 8, CUDA: True 11: Launching on nid006123 (11/32), master nid005142 port 9999, GPUs 8, CUDA: True 3: Launching on nid005145 (3/32), master nid005142 port 9999, GPUs 8, CUDA: True 2: Launching on nid005144 (2/32), master nid005142 port 9999, GPUs 8, CUDA: True 22: Launching on nid006134 (22/32), master nid005142 port 9999, GPUs 8, CUDA: True 0: using world size: 256, data-parallel-size: 256, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. 0: using torch.bfloat16 for parameters ... 0: ------------------------ arguments ------------------------ 0: abort_on_unmet_fused_kernel_constraints ......... False 0: accumulate_allreduce_grads_in_fp32 .............. True 0: adam_beta1 ...................................... 0.9 0: adam_beta2 ...................................... 0.999 0: adam_eps ........................................ 1e-08 0: adlr_autoresume ................................. False 0: adlr_autoresume_interval ........................ 1000 0: apply_query_key_layer_scaling ................... True 0: apply_residual_connection_post_layernorm ........ False 0: attention_dropout ............................... 0.1 0: attention_softmax_in_fp32 ....................... False 0: bert_binary_head ................................ True 0: bert_load ....................................... None 0: bf16 ............................................ True 0: bias_dropout_fusion ............................. True 0: bias_gelu_fusion ................................ True 0: biencoder_projection_dim ........................ 0 0: biencoder_shared_query_context_model ............ False 0: block_data_path ................................. None 0: checkpoint_activations .......................... False 0: checkpoint_in_cpu ............................... False 0: checkpoint_num_layers ........................... 1 0: clip_grad ....................................... 1.0 0: codecarbon_dir .................................. None 0: consumed_train_samples .......................... 0 0: consumed_train_tokens ........................... 0 0: consumed_valid_samples .......................... 0 0: contigious_checkpointing ........................ False 0: cpu_optimizer ................................... False 0: cpu_torch_adam .................................. False 0: curriculum_learning ............................. False 0: data_impl ....................................... mmap 0: data_parallel_size .............................. 256 0: data_path ....................................... None 0: dataloader_type ................................. single 0: DDP_impl ........................................ local 0: decoder_seq_length .............................. None 0: deepscale ....................................... False 0: deepscale_config ................................ None 0: deepspeed ....................................... True 0: deepspeed_activation_checkpointing .............. False 0: deepspeed_config ................................ ds_configs/3420535.json 0: deepspeed_mpi ................................... False 0: distribute_checkpointed_activations ............. False 0: distributed_backend ............................. nccl 0: embed_layernorm ................................. False 0: embedding_path .................................. None 0: encoder_seq_length .............................. 2048 0: eod_mask_loss ................................... False 0: eval_interval ................................... 10000 0: eval_iters ...................................... 1 0: eval_only ....................................... None 0: evidence_data_path .............................. None 0: exit_duration_in_mins ........................... None 0: exit_interval ................................... None 0: ffn_hidden_size ................................. 7168 0: finetune ........................................ False 0: fp16 ............................................ False 0: fp16_lm_cross_entropy ........................... False 0: fp32_residual_connection ........................ False 0: gigaflos_no_embeds .............................. 0 0: global_batch_size ............................... 256 0: glu_activation .................................. None 0: hidden_dropout .................................. 0.1 0: hidden_size ..................................... 1792 0: hysteresis ...................................... 2 0: ict_head_size ................................... None 0: ict_load ........................................ None 0: img_dim ......................................... 224 0: indexer_batch_size .............................. 128 0: indexer_log_interval ............................ 1000 0: inference ....................................... False 0: init_method_std ................................. 0.02 0: init_method_xavier_uniform ...................... False 0: initial_loss_scale .............................. 4294967296 0: kill_switch_path ................................ kill-switch-1b1250b1b5 0: kv_channels ..................................... 128 0: layer_norm_fusion ............................... True 0: layernorm_epsilon ............................... 1e-05 0: lazy_mpu_init ................................... None 0: load ............................................ checkpoints_1b1250b1b5 0: local_rank ...................................... None 0: log_batch_size_to_tensorboard ................... True 0: log_interval .................................... 100 0: log_learning_rate_to_tensorboard ................ True 0: log_level ....................................... None 0: log_level_replica ............................... None 0: log_loss_scale_to_tensorboard ................... True 0: log_num_zeros_in_grad ........................... False 0: log_params_norm ................................. False 0: log_path ........................................ None 0: log_timers_to_tensorboard ....................... True 0: log_validation_ppl_to_tensorboard ............... True 0: loss_on_targets_only ............................ False 0: loss_scale ...................................... None 0: loss_scale_window ............................... 1000 0: lr .............................................. 0.0002 0: lr_decay_iters .................................. None 0: lr_decay_samples ................................ 122070313 0: lr_decay_style .................................. cosine 0: lr_decay_tokens ................................. None 0: lr_warmup_fraction .............................. None 0: lr_warmup_iters ................................. 0 0: lr_warmup_samples ............................... 1220703 0: make_vocab_size_divisible_by .................... 128 0: mask_prob ....................................... 0.15 0: masked_softmax_fusion ........................... True 0: max_position_embeddings ......................... 2048 0: mean_noise_span_length .......................... None 0: memory_centric_tiled_linear ..................... False 0: merge_file ...................................... gpt2/merges.txt 0: micro_batch_size ................................ 1 0: min_loss_scale .................................. 1.0 0: min_lr .......................................... 2e-05 0: mmap_warmup ..................................... False 0: no_load_optim ................................... None 0: no_load_rng ..................................... None 0: no_save_optim ................................... None 0: no_save_rng ..................................... None 0: noise_density ................................... None 0: num_attention_heads ............................. 14 0: num_channels .................................... 3 0: num_classes ..................................... 1000 0: num_layers ...................................... 26 0: num_layers_per_virtual_pipeline_stage ........... None 0: num_workers ..................................... 2 0: onnx_safe ....................................... None 0: openai_gelu ..................................... False 0: optimizer ....................................... adam 0: optimizer_fusion ................................ True 0: override_lr_scheduler ........................... False 0: pad_vocab_size_to ............................... None 0: params_dtype .................................... torch.bfloat16 0: partition_activations ........................... False 0: patch_dim ....................................... 16 0: pipeline_model_parallel_size .................... 1 0: position_embedding_type ......................... PositionEmbeddingType.absolute 0: pp_partition_method ............................. None 0: profile_backward ................................ False 0: query_in_block_prob ............................. 0.1 0: rampup_batch_size ............................... None 0: rank ............................................ 0 0: remote_device ................................... none 0: reset_attention_mask ............................ False 0: reset_position_ids .............................. False 0: reset_progress .................................. None 0: retriever_report_topk_accuracies ................ [] 0: retriever_score_scaling ......................... False 0: retriever_seq_length ............................ 256 0: reweight_loss_based_on_position_frequency ....... False 0: sample_rate ..................................... 1.0 0: save ............................................ checkpoints_1b1250b1b5 0: save_interval ................................... 20000 0: scatter_gather_tensors_in_pipeline .............. True 0: scattered_embeddings ............................ False 0: seed ............................................ 1234 0: seq_length ...................................... 2048 0: sgd_momentum .................................... 0.9 0: short_seq_prob .................................. 0.1 0: skip_train_iteration_range ...................... None 0: split ........................................... None 0: split_transformers .............................. False 0: sync_tp_duplicated_parameters ................... False 0: synchronize_each_layer .......................... False 0: tensor_model_parallel_size ...................... 1 0: tensorboard_dir ................................. tensorboard_1b1250b1b5 0: tensorboard_log_interval ........................ 1 0: tensorboard_queue_size .......................... 5 0: test_weighted_split_paths ....................... None 0: test_weighted_split_paths_path .................. None 0: tile_factor ..................................... 1 0: titles_data_path ................................ None 0: tokenizer_name_or_path .......................... None 0: tokenizer_type .................................. GPT2BPETokenizer 0: train_iters ..................................... None 0: train_samples ................................... 122070313 0: train_tokens .................................... None 0: train_weighted_split_names ...................... ['train'] 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document']] 0: train_weighted_split_paths_path ................. None 0: train_weighted_split_splits ..................... [['0:1']] 0: train_weighted_split_weights .................... [['1.0']] 0: universal_checkpoint ............................ False 0: use_bnb_optimizer ............................... False 0: use_checkpoint_lr_scheduler ..................... False 0: use_contiguous_buffers_in_ddp ................... True 0: use_cpu_initialization .......................... None 0: use_one_sent_docs ............................... False 0: use_pin_memory .................................. False 0: valid_num_workers ............................... 2 0: valid_weighted_split_names ...................... ['validation'] 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] 0: valid_weighted_split_paths_path ................. None 0: valid_weighted_split_splits ..................... [['0:1']] 0: valid_weighted_split_weights .................... [['1.0']] 0: virtual_pipeline_model_parallel_size ............ None 0: vocab_extra_ids ................................. 0 0: vocab_file ...................................... gpt2/vocab.json 0: weight_decay .................................... 0.1 0: world_size ...................................... 256 0: zero_allgather_bucket_size ...................... 0.0 0: zero_contigious_gradients ....................... False 0: zero_reduce_bucket_size ......................... 0.0 0: zero_reduce_scatter ............................. False 0: zero_stage ...................................... 0 0: -------------------- end of arguments --------------------- 0: setting number of micro-batches to constant 1 0: > building GPT2BPETokenizer tokenizer ... 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) 0: DeepSpeed general environment info: 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] 0: torch version .................... 1.13.0+rocm5.2 0: torch cuda version ............... None 0: torch hip version ................ 5.2.21151-afdc89f8 0: nvcc version ..................... None 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] 0: deepspeed info ................... 0.7.5, unknown, unknown 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 31: > setting tensorboard ... 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** 0: > initializing torch distributed ... 0: [2023-04-27 00:00:38,094] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 0: > initializing tensor model parallel with size 1 0: > initializing pipeline model parallel with size 1 0: > setting random seeds to 1234 ... 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 0: > compiling dataset index builder ... 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: make: Nothing to be done for 'default'. 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: >>> done with dataset index builder. Compilation time: 0.097 seconds 0: WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. 0: > compiling and loading fused kernels ... 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 87 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/opt/rocm/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 63 0: ninja: no work to do. 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 67 0: ninja: no work to do. 0: >>> done with compiling and loading fused kernels. Compilation time: 21.056 seconds 0: time to initialize megatron (seconds): 22.781 0: [after megatron is initialized] datetime: 2023-04-27 00:01:10 0: building GPT model ... 0: [2023-04-27 00:01:10,958] [INFO] [utils.py:827:see_memory_usage] Before Building Model 0: [2023-04-27 00:01:10,959] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB 0: [2023-04-27 00:01:10,960] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 38.7 GB, percent = 7.7% 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63, ProcessCoord(pipe=0, data=64, model=0): 64, ProcessCoord(pipe=0, data=65, model=0): 65, ProcessCoord(pipe=0, data=66, model=0): 66, ProcessCoord(pipe=0, data=67, model=0): 67, ProcessCoord(pipe=0, data=68, model=0): 68, ProcessCoord(pipe=0, data=69, model=0): 0: 69, ProcessCoord(pipe=0, data=70, model=0): 70, ProcessCoord(pipe=0, data=71, model=0): 71, ProcessCoord(pipe=0, data=72, model=0): 72, ProcessCoord(pipe=0, data=73, model=0): 73, ProcessCoord(pipe=0, data=74, model=0): 74, ProcessCoord(pipe=0, data=75, model=0): 75, ProcessCoord(pipe=0, data=76, model=0): 76, ProcessCoord(pipe=0, data=77, model=0): 77, ProcessCoord(pipe=0, data=78, model=0): 78, ProcessCoord(pipe=0, data=79, model=0): 79, ProcessCoord(pipe=0, data=80, model=0): 80, ProcessCoord(pipe=0, data=81, model=0): 81, ProcessCoord(pipe=0, data=82, model=0): 82, ProcessCoord(pipe=0, data=83, model=0): 83, ProcessCoord(pipe=0, data=84, model=0): 84, ProcessCoord(pipe=0, data=85, model=0): 85, ProcessCoord(pipe=0, data=86, model=0): 86, ProcessCoord(pipe=0, data=87, model=0): 87, ProcessCoord(pipe=0, data=88, model=0): 88, ProcessCoord(pipe=0, data=89, model=0): 89, ProcessCoord(pipe=0, data=90, model=0): 90, ProcessCoord(pipe=0, data=91, model=0): 91, ProcessCoord(pipe=0, data=92, model=0): 92, Process 0: Coord(pipe=0, data=93, model=0): 93, ProcessCoord(pipe=0, data=94, model=0): 94, ProcessCoord(pipe=0, data=95, model=0): 95, ProcessCoord(pipe=0, data=96, model=0): 96, ProcessCoord(pipe=0, data=97, model=0): 97, ProcessCoord(pipe=0, data=98, model=0): 98, ProcessCoord(pipe=0, data=99, model=0): 99, ProcessCoord(pipe=0, data=100, model=0): 100, ProcessCoord(pipe=0, data=101, model=0): 101, ProcessCoord(pipe=0, data=102, model=0): 102, ProcessCoord(pipe=0, data=103, model=0): 103, ProcessCoord(pipe=0, data=104, model=0): 104, ProcessCoord(pipe=0, data=105, model=0): 105, ProcessCoord(pipe=0, data=106, model=0): 106, ProcessCoord(pipe=0, data=107, model=0): 107, ProcessCoord(pipe=0, data=108, model=0): 108, ProcessCoord(pipe=0, data=109, model=0): 109, ProcessCoord(pipe=0, data=110, model=0): 110, ProcessCoord(pipe=0, data=111, model=0): 111, ProcessCoord(pipe=0, data=112, model=0): 112, ProcessCoord(pipe=0, data=113, model=0): 113, ProcessCoord(pipe=0, data=114, model=0): 114, ProcessCoord(pipe=0, data=115, mo 0: del=0): 115, ProcessCoord(pipe=0, data=116, model=0): 116, ProcessCoord(pipe=0, data=117, model=0): 117, ProcessCoord(pipe=0, data=118, model=0): 118, ProcessCoord(pipe=0, data=119, model=0): 119, ProcessCoord(pipe=0, data=120, model=0): 120, ProcessCoord(pipe=0, data=121, model=0): 121, ProcessCoord(pipe=0, data=122, model=0): 122, ProcessCoord(pipe=0, data=123, model=0): 123, ProcessCoord(pipe=0, data=124, model=0): 124, ProcessCoord(pipe=0, data=125, model=0): 125, ProcessCoord(pipe=0, data=126, model=0): 126, ProcessCoord(pipe=0, data=127, model=0): 127, ProcessCoord(pipe=0, data=128, model=0): 128, ProcessCoord(pipe=0, data=129, model=0): 129, ProcessCoord(pipe=0, data=130, model=0): 130, ProcessCoord(pipe=0, data=131, model=0): 131, ProcessCoord(pipe=0, data=132, model=0): 132, ProcessCoord(pipe=0, data=133, model=0): 133, ProcessCoord(pipe=0, data=134, model=0): 134, ProcessCoord(pipe=0, data=135, model=0): 135, ProcessCoord(pipe=0, data=136, model=0): 136, ProcessCoord(pipe=0, data=137, model=0): 137, 0: ProcessCoord(pipe=0, data=138, model=0): 138, ProcessCoord(pipe=0, data=139, model=0): 139, ProcessCoord(pipe=0, data=140, model=0): 140, ProcessCoord(pipe=0, data=141, model=0): 141, ProcessCoord(pipe=0, data=142, model=0): 142, ProcessCoord(pipe=0, data=143, model=0): 143, ProcessCoord(pipe=0, data=144, model=0): 144, ProcessCoord(pipe=0, data=145, model=0): 145, ProcessCoord(pipe=0, data=146, model=0): 146, ProcessCoord(pipe=0, data=147, model=0): 147, ProcessCoord(pipe=0, data=148, model=0): 148, ProcessCoord(pipe=0, data=149, model=0): 149, ProcessCoord(pipe=0, data=150, model=0): 150, ProcessCoord(pipe=0, data=151, model=0): 151, ProcessCoord(pipe=0, data=152, model=0): 152, ProcessCoord(pipe=0, data=153, model=0): 153, ProcessCoord(pipe=0, data=154, model=0): 154, ProcessCoord(pipe=0, data=155, model=0): 155, ProcessCoord(pipe=0, data=156, model=0): 156, ProcessCoord(pipe=0, data=157, model=0): 157, ProcessCoord(pipe=0, data=158, model=0): 158, ProcessCoord(pipe=0, data=159, model=0): 159, ProcessCoor 0: d(pipe=0, data=160, model=0): 160, ProcessCoord(pipe=0, data=161, model=0): 161, ProcessCoord(pipe=0, data=162, model=0): 162, ProcessCoord(pipe=0, data=163, model=0): 163, ProcessCoord(pipe=0, data=164, model=0): 164, ProcessCoord(pipe=0, data=165, model=0): 165, ProcessCoord(pipe=0, data=166, model=0): 166, ProcessCoord(pipe=0, data=167, model=0): 167, ProcessCoord(pipe=0, data=168, model=0): 168, ProcessCoord(pipe=0, data=169, model=0): 169, ProcessCoord(pipe=0, data=170, model=0): 170, ProcessCoord(pipe=0, data=171, model=0): 171, ProcessCoord(pipe=0, data=172, model=0): 172, ProcessCoord(pipe=0, data=173, model=0): 173, ProcessCoord(pipe=0, data=174, model=0): 174, ProcessCoord(pipe=0, data=175, model=0): 175, ProcessCoord(pipe=0, data=176, model=0): 176, ProcessCoord(pipe=0, data=177, model=0): 177, ProcessCoord(pipe=0, data=178, model=0): 178, ProcessCoord(pipe=0, data=179, model=0): 179, ProcessCoord(pipe=0, data=180, model=0): 180, ProcessCoord(pipe=0, data=181, model=0): 181, ProcessCoord(pipe=0, da 0: ta=182, model=0): 182, ProcessCoord(pipe=0, data=183, model=0): 183, ProcessCoord(pipe=0, data=184, model=0): 184, ProcessCoord(pipe=0, data=185, model=0): 185, ProcessCoord(pipe=0, data=186, model=0): 186, ProcessCoord(pipe=0, data=187, model=0): 187, ProcessCoord(pipe=0, data=188, model=0): 188, ProcessCoord(pipe=0, data=189, model=0): 189, ProcessCoord(pipe=0, data=190, model=0): 190, ProcessCoord(pipe=0, data=191, model=0): 191, ProcessCoord(pipe=0, data=192, model=0): 192, ProcessCoord(pipe=0, data=193, model=0): 193, ProcessCoord(pipe=0, data=194, model=0): 194, ProcessCoord(pipe=0, data=195, model=0): 195, ProcessCoord(pipe=0, data=196, model=0): 196, ProcessCoord(pipe=0, data=197, model=0): 197, ProcessCoord(pipe=0, data=198, model=0): 198, ProcessCoord(pipe=0, data=199, model=0): 199, ProcessCoord(pipe=0, data=200, model=0): 200, ProcessCoord(pipe=0, data=201, model=0): 201, ProcessCoord(pipe=0, data=202, model=0): 202, ProcessCoord(pipe=0, data=203, model=0): 203, ProcessCoord(pipe=0, data=204, mode 0: l=0): 204, ProcessCoord(pipe=0, data=205, model=0): 205, ProcessCoord(pipe=0, data=206, model=0): 206, ProcessCoord(pipe=0, data=207, model=0): 207, ProcessCoord(pipe=0, data=208, model=0): 208, ProcessCoord(pipe=0, data=209, model=0): 209, ProcessCoord(pipe=0, data=210, model=0): 210, ProcessCoord(pipe=0, data=211, model=0): 211, ProcessCoord(pipe=0, data=212, model=0): 212, ProcessCoord(pipe=0, data=213, model=0): 213, ProcessCoord(pipe=0, data=214, model=0): 214, ProcessCoord(pipe=0, data=215, model=0): 215, ProcessCoord(pipe=0, data=216, model=0): 216, ProcessCoord(pipe=0, data=217, model=0): 217, ProcessCoord(pipe=0, data=218, model=0): 218, ProcessCoord(pipe=0, data=219, model=0): 219, ProcessCoord(pipe=0, data=220, model=0): 220, ProcessCoord(pipe=0, data=221, model=0): 221, ProcessCoord(pipe=0, data=222, model=0): 222, ProcessCoord(pipe=0, data=223, model=0): 223, ProcessCoord(pipe=0, data=224, model=0): 224, ProcessCoord(pipe=0, data=225, model=0): 225, ProcessCoord(pipe=0, data=226, model=0): 226, P 0: rocessCoord(pipe=0, data=227, model=0): 227, ProcessCoord(pipe=0, data=228, model=0): 228, ProcessCoord(pipe=0, data=229, model=0): 229, ProcessCoord(pipe=0, data=230, model=0): 230, ProcessCoord(pipe=0, data=231, model=0): 231, ProcessCoord(pipe=0, data=232, model=0): 232, ProcessCoord(pipe=0, data=233, model=0): 233, ProcessCoord(pipe=0, data=234, model=0): 234, ProcessCoord(pipe=0, data=235, model=0): 235, ProcessCoord(pipe=0, data=236, model=0): 236, ProcessCoord(pipe=0, data=237, model=0): 237, ProcessCoord(pipe=0, data=238, model=0): 238, ProcessCoord(pipe=0, data=239, model=0): 239, ProcessCoord(pipe=0, data=240, model=0): 240, ProcessCoord(pipe=0, data=241, model=0): 241, ProcessCoord(pipe=0, data=242, model=0): 242, ProcessCoord(pipe=0, data=243, model=0): 243, ProcessCoord(pipe=0, data=244, model=0): 244, ProcessCoord(pipe=0, data=245, model=0): 245, ProcessCoord(pipe=0, data=246, model=0): 246, ProcessCoord(pipe=0, data=247, model=0): 247, ProcessCoord(pipe=0, data=248, model=0): 248, ProcessCoord( 0: pipe=0, data=249, model=0): 249, ProcessCoord(pipe=0, data=250, model=0): 250, ProcessCoord(pipe=0, data=251, model=0): 251, ProcessCoord(pipe=0, data=252, model=0): 252, ProcessCoord(pipe=0, data=253, model=0): 253, ProcessCoord(pipe=0, data=254, model=0): 254, ProcessCoord(pipe=0, data=255, model=0): 255} 0: [2023-04-27 00:01:19,120] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer 0: stage=0 layers=33 0: 0: _to_float16 0: 1: EmbeddingPipe 0: 2: 0: 3: ParallelTransformerLayerPipe 0: 4: ParallelTransformerLayerPipe 0: 5: ParallelTransformerLayerPipe 0: 6: ParallelTransformerLayerPipe 0: 7: ParallelTransformerLayerPipe 0: 8: ParallelTransformerLayerPipe 0: 9: ParallelTransformerLayerPipe 0: 10: ParallelTransformerLayerPipe 0: 11: ParallelTransformerLayerPipe 0: 12: ParallelTransformerLayerPipe 0: 13: ParallelTransformerLayerPipe 0: 14: ParallelTransformerLayerPipe 0: 15: ParallelTransformerLayerPipe 0: 16: ParallelTransformerLayerPipe 0: 17: ParallelTransformerLayerPipe 0: 18: ParallelTransformerLayerPipe 0: 19: ParallelTransformerLayerPipe 0: 20: ParallelTransformerLayerPipe 0: 21: ParallelTransformerLayerPipe 0: 22: ParallelTransformerLayerPipe 0: 23: ParallelTransformerLayerPipe 0: 24: ParallelTransformerLayerPipe 0: 25: ParallelTransformerLayerPipe 0: 26: ParallelTransformerLayerPipe 0: 27: ParallelTransformerLayerPipe 0: 28: ParallelTransformerLayerPipe 0: 29: undo 0: 30: MixedFusedLayerNorm 0: 31: EmbeddingPipe 0: 32: float16_to_fp32 0: loss: CrossEntropy 0: [2023-04-27 00:01:19,354] [INFO] [utils.py:827:see_memory_usage] After Building Model 0: [2023-04-27 00:01:19,354] [INFO] [utils.py:828:see_memory_usage] MA 2.05 GB Max_MA 2.05 GB CA 2.19 GB Max_CA 2 GB 0: [2023-04-27 00:01:19,354] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 38.85 GB, percent = 7.7% 0: setting training iterations to 476837 0: > learning rate decay style: cosine 0: DeepSpeed is enabled. 0: [2023-04-27 00:01:19,357] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown 0: [2023-04-27 00:01:29,637] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 0: [2023-04-27 00:01:29,637] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer 0: [2023-04-27 00:01:29,637] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer 0: [2023-04-27 00:01:29,649] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam 0: [2023-04-27 00:01:29,650] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer 0: [2023-04-27 00:01:29,768] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer 0: [2023-04-27 00:01:29,768] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.06 GB CA 2.19 GB Max_CA 2 GB 0: [2023-04-27 00:01:29,768] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.65 GB, percent = 7.9% 0: ninja: no work to do. 0: Time to load utils op: 0.18032479286193848 seconds 0: Time to load utils op: 0.20296335220336914 seconds 0: Time to load utils op: 0.20301294326782227 seconds 0: Time to load utils op: 0.2035980224609375 seconds 0: Time to load utils op: 0.20400452613830566 seconds 0: Time to load utils op: 0.20395374298095703 seconds 0: Time to load utils op: 0.20397329330444336 seconds 0: Time to load utils op: 0.2031240463256836 seconds 2: Time to load utils op: 0.21231484413146973 secondsTime to load utils op: 0.21254611015319824 secondsTime to load utils op: 0.2127852439880371 secondsTime to load utils op: 0.21322035789489746 seconds 2: 2: Time to load utils op: 0.21244215965270996 seconds 2: 2: 2: Time to load utils op: 0.21326065063476562 secondsTime to load utils op: 0.21307873725891113 seconds 2: Time to load utils op: 0.21307039260864258 seconds 2: 4: Time to load utils op: 0.21387338638305664 secondsTime to load utils op: 0.21323227882385254 seconds 4: 4: Time to load utils op: 0.21349024772644043 seconds 4: Time to load utils op: 0.21294212341308594 secondsTime to load utils op: 0.21366119384765625 secondsTime to load utils op: 0.2138195037841797 seconds 4: 4: Time to load utils op: 0.21288585662841797 seconds 4: 4: Time to load utils op: 0.21353983879089355 seconds 3: Time to load utils op: 0.2145984172821045 secondsTime to load utils op: 0.2145063877105713 seconds 3: 3: Time to load utils op: 0.21465468406677246 secondsTime to load utils op: 0.21458959579467773 secondsTime to load utils op: 0.21452116966247559 secondsTime to load utils op: 0.2145850658416748 seconds 3: 3: 3: Time to load utils op: 0.21436619758605957 seconds 3: Time to load utils op: 0.21456480026245117 seconds 3: 5: Time to load utils op: 0.21282362937927246 seconds 5: Time to load utils op: 0.21282124519348145 secondsTime to load utils op: 0.2128126621246338 secondsTime to load utils op: 0.212815523147583 seconds 5: 5: 5: Time to load utils op: 0.21324944496154785 seconds 5: Time to load utils op: 0.21280217170715332 secondsTime to load utils op: 0.2128450870513916 seconds 5: 5: Time to load utils op: 0.2128586769104004 seconds 1: Time to load utils op: 0.21535563468933105 secondsTime to load utils op: 0.2153630256652832 seconds 1: 1: Time to load utils op: 0.2153778076171875 secondsTime to load utils op: 0.21536827087402344 secondsTime to load utils op: 0.2153780460357666 seconds 1: 1: 1: Time to load utils op: 0.21538043022155762 seconds 1: Time to load utils op: 0.21538400650024414 secondsTime to load utils op: 0.2153942584991455 seconds 1: 8: Time to load utils op: 0.22347354888916016 seconds 8: Time to load utils op: 0.2234961986541748 seconds 8: Time to load utils op: 0.2234964370727539 seconds 8: Time to load utils op: 0.22351312637329102 seconds 8: Time to load utils op: 0.22351455688476562 seconds 8: Time to load utils op: 0.22352147102355957 seconds 8: Time to load utils op: 0.22353029251098633 seconds 8: Time to load utils op: 0.2235395908355713 seconds 9: Time to load utils op: 0.22359347343444824 secondsTime to load utils op: 0.22359132766723633 secondsTime to load utils op: 0.2235872745513916 secondsTime to load utils op: 0.22358942031860352 seconds 9: 9: 9: 9: Time to load utils op: 0.22359824180603027 secondsTime to load utils op: 0.22359585762023926 secondsTime to load utils op: 0.22360897064208984 secondsTime to load utils op: 0.22359704971313477 seconds 9: 9: 9: 7: Time to load utils op: 0.22583365440368652 secondsTime to load utils op: 0.2258288860321045 secondsTime to load utils op: 0.22583723068237305 seconds 7: 7: 7: Time to load utils op: 0.22585487365722656 seconds 7: Time to load utils op: 0.2258586883544922 seconds 7: Time to load utils op: 0.22585439682006836 seconds 6: Time to load utils op: 0.22632813453674316 seconds 6: Time to load utils op: 0.22632813453674316 secondsTime to load utils op: 0.22634124755859375 secondsTime to load utils op: 0.22634172439575195 seconds 6: 6: 6: Time to load utils op: 0.22634291648864746 secondsTime to load utils op: 0.226334810256958 secondsTime to load utils op: 0.2263350486755371 seconds 6: 6: 6: Time to load utils op: 0.22634291648864746 seconds 7: Time to load utils op: 0.22587203979492188 seconds 7: Time to load utils op: 0.22587108612060547 seconds 11: Time to load utils op: 0.2224259376525879 secondsTime to load utils op: 0.22243523597717285 seconds 11: 11: Time to load utils op: 0.2224419116973877 seconds 11: Time to load utils op: 0.22244501113891602 secondsTime to load utils op: 0.22244572639465332 seconds 11: 11: Time to load utils op: 0.22246336936950684 seconds 11: Time to load utils op: 0.22246050834655762 seconds 11: Time to load utils op: 0.22246646881103516 seconds 15: Time to load utils op: 0.21935009956359863 seconds 15: Time to load utils op: 0.21936798095703125 secondsTime to load utils op: 0.21935796737670898 seconds 10: Time to load utils op: 0.22550702095031738 seconds 10: Time to load utils op: 0.2255113124847412 secondsTime to load utils op: 0.2255115509033203 secondsTime to load utils op: 0.22552084922790527 seconds 10: 10: 10: Time to load utils op: 0.22553396224975586 seconds 10: Time to load utils op: 0.2255244255065918 seconds 15: 15: Time to load utils op: 0.21936750411987305 secondsTime to load utils op: 0.2193586826324463 secondsTime to load utils op: 0.21936368942260742 seconds 15: Time to load utils op: 0.21936464309692383 seconds 15: 15: 15: Time to load utils op: 0.21936321258544922 seconds 10: Time to load utils op: 0.2255268096923828 secondsTime to load utils op: 0.22552824020385742 seconds 10: 14: Time to load utils op: 0.22139382362365723 seconds 14: Time to load utils op: 0.22140002250671387 seconds 14: Time to load utils op: 0.22141003608703613 secondsTime to load utils op: 0.22141003608703613 seconds 14: 14: Time to load utils op: 0.2214200496673584 seconds 14: Time to load utils op: 0.22142291069030762 secondsTime to load utils op: 0.22143173217773438 secondsTime to load utils op: 0.22143030166625977 seconds 14: 14: 12: Time to load utils op: 0.22377514839172363 secondsTime to load utils op: 0.22378134727478027 secondsTime to load utils op: 0.22379016876220703 seconds 12: Time to load utils op: 0.223785400390625 seconds 12: 12: Time to load utils op: 0.22379493713378906 seconds 12: Time to load utils op: 0.22379302978515625 secondsTime to load utils op: 0.22380781173706055 seconds 12: Time to load utils op: 0.2238006591796875 seconds 12: 12: 13: Time to load utils op: 0.2222898006439209 secondsTime to load utils op: 0.22229480743408203 secondsTime to load utils op: 0.22230124473571777 seconds 13: 13: 13: Time to load utils op: 0.22230768203735352 seconds 13: Time to load utils op: 0.22231626510620117 seconds 13: Time to load utils op: 0.22231650352478027 secondsTime to load utils op: 0.2223353385925293 seconds 13: 13: Time to load utils op: 0.22233915328979492 seconds 19: Time to load utils op: 0.21543478965759277 secondsTime to load utils op: 0.21543312072753906 secondsTime to load utils op: 0.21543526649475098 secondsTime to load utils op: 0.2154390811920166 secondsTime to load utils op: 0.2154371738433838 seconds 19: 19: 19: Time to load utils op: 0.21544194221496582 seconds 19: 19: 19: Time to load utils op: 0.21544647216796875 seconds 19: Time to load utils op: 0.21545791625976562 seconds 21: Time to load utils op: 0.2179110050201416 seconds 21: Time to load utils op: 0.2167949676513672 secondsTime to load utils op: 0.21674180030822754 secondsTime to load utils op: 0.2167668342590332 seconds 21: 21: 21: Time to load utils op: 0.21701645851135254 secondsTime to load utils op: 0.21775579452514648 seconds 21: 21: Time to load utils op: 0.2168285846710205 secondsTime to load utils op: 0.2179245948791504 seconds 23: Time to load utils op: 0.21550774574279785 seconds 23: Time to load utils op: 0.2162637710571289 secondsTime to load utils op: 0.21554327011108398 seconds 23: 21: 23: Time to load utils op: 0.21528077125549316 seconds 23: Time to load utils op: 0.21558785438537598 secondsTime to load utils op: 0.21419286727905273 seconds 23: 23: Time to load utils op: 0.21620631217956543 secondsTime to load utils op: 0.21493220329284668 seconds 23: 18: Time to load utils op: 0.21690988540649414 secondsTime to load utils op: 0.21691226959228516 seconds 18: 18: Time to load utils op: 0.21691489219665527 secondsTime to load utils op: 0.21691131591796875 secondsTime to load utils op: 0.21691203117370605 secondsTime to load utils op: 0.21692204475402832 seconds 18: 18: 18: 18: Time to load utils op: 0.2169172763824463 seconds 18: Time to load utils op: 0.21693706512451172 seconds 25: Time to load utils op: 0.2129974365234375 secondsTime to load utils op: 0.21332025527954102 seconds 25: 25: Time to load utils op: 0.21335148811340332 seconds 25: Time to load utils op: 0.21328377723693848 secondsTime to load utils op: 0.2141563892364502 secondsTime to load utils op: 0.21301579475402832 seconds 25: 25: 25: Time to load utils op: 0.21387982368469238 secondsTime to load utils op: 0.2146923542022705 seconds 25: 17: Time to load utils op: 0.21895623207092285 secondsTime to load utils op: 0.21895503997802734 seconds 17: Time to load utils op: 0.21895980834960938 secondsTime to load utils op: 0.2189648151397705 seconds 17: 17: 17: Time to load utils op: 0.21896672248840332 seconds 17: Time to load utils op: 0.21897506713867188 seconds 17: Time to load utils op: 0.21897149085998535 secondsTime to load utils op: 0.2189803123474121 seconds 17: 22: Time to load utils op: 0.21425461769104004 secondsTime to load utils op: 0.2142488956451416 seconds 22: Time to load utils op: 0.21425223350524902 seconds 22: 22: Time to load utils op: 0.21425700187683105 secondsTime to load utils op: 0.21425795555114746 seconds 22: Time to load utils op: 0.21425700187683105 secondsTime to load utils op: 0.21426033973693848 seconds 22: Time to load utils op: 0.21425414085388184 seconds 22: 22: 20: Time to load utils op: 0.21646976470947266 secondsTime to load utils op: 0.21646428108215332 secondsTime to load utils op: 0.216477632522583 secondsTime to load utils op: 0.21647357940673828 secondsTime to load utils op: 0.2164759635925293 seconds 20: 20: 20: 20: Time to load utils op: 0.21647882461547852 seconds 20: 20: Time to load utils op: 0.21648049354553223 seconds 20: Time to load utils op: 0.21650385856628418 seconds 26: Time to load utils op: 0.21425604820251465 secondsTime to load utils op: 0.21424603462219238 seconds 26: 26: Time to load utils op: 0.21426916122436523 secondsTime to load utils op: 0.2142493724822998 secondsTime to load utils op: 0.2141563892364502 seconds 26: 26: 26: Time to load utils op: 0.2142505645751953 seconds 26: Time to load utils op: 0.21410465240478516 secondsTime to load utils op: 0.2140967845916748 seconds 26: 24: Time to load utils op: 0.2138986587524414 secondsTime to load utils op: 0.2138965129852295 secondsTime to load utils op: 0.21390342712402344 seconds 24: Time to load utils op: 0.2138979434967041 secondsTime to load utils op: 0.2138974666595459 seconds 24: 24: Time to load utils op: 0.21390485763549805 seconds 24: 24: 24: Time to load utils op: 0.21390843391418457 seconds 24: Time to load utils op: 0.2139136791229248 seconds 29: Time to load utils op: 0.21602678298950195 secondsTime to load utils op: 0.21479344367980957 seconds 29: Time to load utils op: 0.21514511108398438 seconds 29: 29: Time to load utils op: 0.21611905097961426 secondsTime to load utils op: 0.2151339054107666 seconds 29: 29: Time to load utils op: 0.21484875679016113 seconds 29: Time to load utils op: 0.21564412117004395 secondsTime to load utils op: 0.21590757369995117 seconds 29: 27: Time to load utils op: 0.21515703201293945 seconds 27: Time to load utils op: 0.21516108512878418 secondsTime to load utils op: 0.2151637077331543 seconds 27: 27: Time to load utils op: 0.21516776084899902 secondsTime to load utils op: 0.21517395973205566 seconds 27: 27: Time to load utils op: 0.21516847610473633 seconds 27: Time to load utils op: 0.21518850326538086 seconds 27: Time to load utils op: 0.21518683433532715 seconds 28: Time to load utils op: 0.21462368965148926 secondsTime to load utils op: 0.21462249755859375 seconds 28: 28: Time to load utils op: 0.21464037895202637 secondsTime to load utils op: 0.21463680267333984 seconds 28: 28: Time to load utils op: 0.21463608741760254 seconds 28: Time to load utils op: 0.2146308422088623 seconds 28: Time to load utils op: 0.2146439552307129 secondsTime to load utils op: 0.2146451473236084 seconds 28: 30: Time to load utils op: 0.21250534057617188 secondsTime to load utils op: 0.212507963180542 secondsTime to load utils op: 0.21250629425048828 seconds 30: 30: Time to load utils op: 0.2125108242034912 secondsTime to load utils op: 0.21252059936523438 seconds 30: 30: Time to load utils op: 0.2125096321105957 seconds 30: 30: Time to load utils op: 0.21251916885375977 seconds 30: Time to load utils op: 0.21253299713134766 seconds 0: Time to load utils op: 0.0005810260772705078 secondsTime to load utils op: 0.0005843639373779297 seconds 0: Time to load utils op: 0.0005800724029541016 seconds 0: 0: Time to load utils op: 0.0006518363952636719 secondsTime to load utils op: 0.0006825923919677734 secondsTime to load utils op: 0.0006587505340576172 secondsTime to load utils op: 0.0006761550903320312 seconds 0: 0: 0: 2: Time to load utils op: 0.0007300376892089844 seconds 16: Time to load utils op: 0.24342083930969238 secondsTime to load utils op: 0.24403071403503418 seconds 16: 16: Time to load utils op: 0.24299097061157227 seconds 16: Time to load utils op: 0.24339675903320312 secondsTime to load utils op: 0.24337053298950195 seconds 16: Time to load utils op: 0.24443554878234863 seconds 16: 16: Time to load utils op: 0.2434077262878418 secondsTime to load utils op: 0.24335718154907227 seconds 16: 2: Time to load utils op: 0.0012049674987792969 seconds 2: Time to load utils op: 0.0011973381042480469 seconds 2: Time to load utils op: 0.0012083053588867188 seconds 2: Time to load utils op: 0.001140594482421875 seconds 2: Time to load utils op: 0.0012238025665283203 seconds 2: Time to load utils op: 0.0012392997741699219 seconds 2: Time to load utils op: 0.0012538433074951172 seconds 31: Time to load utils op: 0.22028589248657227 secondsTime to load utils op: 0.22029614448547363 secondsTime to load utils op: 0.22018837928771973 secondsTime to load utils op: 0.2203056812286377 seconds 31: 31: 31: 31: Time to load utils op: 0.22029733657836914 secondsTime to load utils op: 0.2203068733215332 seconds 31: Time to load utils op: 0.22030305862426758 seconds 31: Time to load utils op: 0.22030973434448242 seconds 31: 4: Time to load utils op: 0.0009162425994873047 seconds 4: Time to load utils op: 0.0011217594146728516 seconds 4: Time to load utils op: 0.0014591217041015625 secondsTime to load utils op: 0.0014431476593017578 seconds 4: 4: Time to load utils op: 0.0013599395751953125 secondsTime to load utils op: 0.001455545425415039 seconds 4: 4: Time to load utils op: 0.0014216899871826172 seconds 4: Time to load utils op: 0.0014617443084716797 seconds 3: Time to load utils op: 0.0007944107055664062 secondsTime to load utils op: 0.0008008480072021484 seconds 3: 3: Time to load utils op: 0.0013070106506347656 secondsTime to load utils op: 0.0012235641479492188 seconds 3: 3: Time to load utils op: 0.0012645721435546875 seconds 3: Time to load utils op: 0.0012102127075195312 seconds 3: Time to load utils op: 0.0013196468353271484 secondsTime to load utils op: 0.001230001449584961 seconds 3: 5: Time to load utils op: 0.0007693767547607422 seconds 5: Time to load utils op: 0.0009846687316894531 seconds 5: Time to load utils op: 0.0009706020355224609 seconds 5: Time to load utils op: 0.0010678768157958984 seconds 5: Time to load utils op: 0.0011360645294189453 secondsTime to load utils op: 0.0011103153228759766 seconds 5: 5: Time to load utils op: 0.001127481460571289 seconds 5: Time to load utils op: 0.0011720657348632812 seconds 1: Time to load utils op: 0.0007719993591308594 seconds 1: Time to load utils op: 0.0013072490692138672 seconds 1: Time to load utils op: 0.0012557506561279297 seconds 1: Time to load utils op: 0.0012862682342529297 seconds 1: Time to load utils op: 0.0012843608856201172 seconds 1: Time to load utils op: 0.001306295394897461 secondsTime to load utils op: 0.0013065338134765625 seconds 1: 1: Time to load utils op: 0.0013480186462402344 seconds 8: Time to load utils op: 0.0008378028869628906 seconds 8: Time to load utils op: 0.0008816719055175781 seconds 8: Time to load utils op: 0.0011212825775146484 seconds 8: Time to load utils op: 0.0012941360473632812 seconds 8: Time to load utils op: 0.0013077259063720703 seconds 8: Time to load utils op: 0.0012454986572265625 seconds 8: Time to load utils op: 0.0011858940124511719 seconds 8: Time to load utils op: 0.0013241767883300781 seconds 25: Time to load utils op: 0.000518798828125 seconds 25: Time to load utils op: 0.0005269050598144531 secondsTime to load utils op: 0.0005354881286621094 seconds 25: 25: Time to load utils op: 0.0005888938903808594 seconds 25: Time to load utils op: 0.00042438507080078125 seconds 23: Time to load utils op: 0.0005424022674560547 seconds 25: Time to load utils op: 0.0005681514739990234 seconds 25: Time to load utils op: 0.0005998611450195312 seconds 25: Time to load utils op: 0.0005767345428466797 seconds 23: Time to load utils op: 0.0004038810729980469 seconds 23: Time to load utils op: 0.0005297660827636719 seconds 23: Time to load utils op: 0.0005559921264648438 secondsTime to load utils op: 0.0005221366882324219 seconds 23: 23: Time to load utils op: 0.0005848407745361328 seconds 23: Time to load utils op: 0.0005779266357421875 seconds 23: Time to load utils op: 0.0005581378936767578 seconds 29: Time to load utils op: 0.000926971435546875 seconds 29: Time to load utils op: 0.0010900497436523438 seconds 29: Time to load utils op: 0.0013279914855957031 seconds 29: Time to load utils op: 0.0012598037719726562 seconds 29: Time to load utils op: 0.0014190673828125 seconds 29: Time to load utils op: 0.0013477802276611328 seconds 29: Time to load utils op: 0.0013766288757324219 seconds 29: Time to load utils op: 0.0014412403106689453 seconds 22: Time to load utils op: 0.0009186267852783203 seconds 22: Time to load utils op: 0.0009641647338867188 seconds 22: Time to load utils op: 0.0010576248168945312 seconds 22: Time to load utils op: 0.0012276172637939453 seconds 22: Time to load utils op: 0.0011882781982421875 secondsTime to load utils op: 0.0011565685272216797 seconds 22: 22: Time to load utils op: 0.001115560531616211 seconds 22: Time to load utils op: 0.0012416839599609375 seconds 17: Time to load utils op: 0.0008168220520019531 seconds 17: Time to load utils op: 0.0010228157043457031 seconds 24: Time to load utils op: 0.0008730888366699219 seconds 17: Time to load utils op: 0.0012819766998291016 seconds 17: Time to load utils op: 0.0013322830200195312 seconds 17: Time to load utils op: 0.0012679100036621094 seconds 17: Time to load utils op: 0.0012252330780029297 secondsTime to load utils op: 0.0012717247009277344 seconds 17: 17: Time to load utils op: 0.0012280941009521484 seconds 24: Time to load utils op: 0.001375436782836914 seconds 24: Time to load utils op: 0.0013301372528076172 seconds 24: Time to load utils op: 0.001340627670288086 seconds 24: Time to load utils op: 0.0013480186462402344 seconds 24: Time to load utils op: 0.0013513565063476562 secondsTime to load utils op: 0.0013666152954101562 seconds 24: 24: Time to load utils op: 0.0013725757598876953 seconds 19: Time to load utils op: 0.0010290145874023438 seconds 10: Time to load utils op: 0.0010361671447753906 seconds 10: Time to load utils op: 0.0009617805480957031 seconds 19: Time to load utils op: 0.0010654926300048828 secondsTime to load utils op: 0.0010530948638916016 seconds 19: 14: Time to load utils op: 0.0010101795196533203 seconds 19: Time to load utils op: 0.0012712478637695312 secondsTime to load utils op: 0.0013079643249511719 seconds 19: 19: Time to load utils op: 0.0012958049774169922 seconds 19: Time to load utils op: 0.0012836456298828125 seconds 19: Time to load utils op: 0.001367330551147461 seconds 10: Time to load utils op: 0.0013916492462158203 seconds 10: Time to load utils op: 0.001386880874633789 seconds 10: Time to load utils op: 0.0013859272003173828 seconds 10: Time to load utils op: 0.0013713836669921875 seconds 15: Time to load utils op: 0.0009295940399169922 seconds 15: Time to load utils op: 0.0009114742279052734 seconds 15: Time to load utils op: 0.0009520053863525391 seconds 10: Time to load utils op: 0.0014925003051757812 seconds 10: Time to load utils op: 0.0014319419860839844 seconds 13: Time to load utils op: 0.0008394718170166016 seconds 14: Time to load utils op: 0.001734018325805664 seconds 11: Time to load utils op: 0.0008242130279541016 seconds 11: Time to load utils op: 0.0008673667907714844 seconds 11: Time to load utils op: 0.0008549690246582031 seconds 14: Time to load utils op: 0.0017282962799072266 seconds 12: Time to load utils op: 0.0010421276092529297 seconds 14: Time to load utils op: 0.0016624927520751953 seconds 9: Time to load utils op: 0.0011742115020751953 seconds 14: Time to load utils op: 0.0016891956329345703 seconds 14: Time to load utils op: 0.0016665458679199219 secondsTime to load utils op: 0.0016918182373046875 seconds 14: 15: Time to load utils op: 0.0013499259948730469 secondsTime to load utils op: 0.0013148784637451172 seconds 15: 15: Time to load utils op: 0.0013158321380615234 seconds 15: Time to load utils op: 0.001308441162109375 seconds 14: Time to load utils op: 0.0017061233520507812 seconds 15: Time to load utils op: 0.0013644695281982422 seconds 13: Time to load utils op: 0.0011947154998779297 secondsTime to load utils op: 0.0012104511260986328 seconds 13: 11: Time to load utils op: 0.0011484622955322266 seconds 9: Time to load utils op: 0.0013048648834228516 seconds 12: Time to load utils op: 0.0013239383697509766 seconds 6: Time to load utils op: 0.0010235309600830078 seconds 26: Time to load utils op: 0.0006616115570068359 seconds 11: Time to load utils op: 0.0014290809631347656 secondsTime to load utils op: 0.0013971328735351562 seconds 11: 13: Time to load utils op: 0.0013637542724609375 secondsTime to load utils op: 0.0013284683227539062 seconds 13: 9: Time to load utils op: 0.0014138221740722656 seconds 27: Time to load utils op: 0.0006413459777832031 seconds 11: Time to load utils op: 0.0013790130615234375 seconds 13: Time to load utils op: 0.0013458728790283203 seconds 13: Time to load utils op: 0.0014045238494873047 seconds 13: Time to load utils op: 0.0014073848724365234 seconds 9: Time to load utils op: 0.0013413429260253906 seconds 9: Time to load utils op: 0.001392364501953125 seconds 9: Time to load utils op: 0.001432657241821289 seconds 9: Time to load utils op: 0.0014064311981201172 seconds 11: Time to load utils op: 0.0014503002166748047 seconds 9: Time to load utils op: 0.0014462471008300781 seconds 26: Time to load utils op: 0.000888824462890625 seconds 26: Time to load utils op: 0.0009779930114746094 seconds 12: Time to load utils op: 0.0017719268798828125 seconds 12: Time to load utils op: 0.001760721206665039 seconds 18: Time to load utils op: 0.0007894039154052734 seconds 12: Time to load utils op: 0.0017316341400146484 seconds 12: Time to load utils op: 0.0017697811126708984 seconds 12: Time to load utils op: 0.0017626285552978516 seconds 27: Time to load utils op: 0.0009698867797851562 seconds 27: Time to load utils op: 0.001003265380859375 seconds 12: Time to load utils op: 0.0018453598022460938 seconds 27: Time to load utils op: 0.001001119613647461 seconds 20: Time to load utils op: 0.0010972023010253906 seconds 27: Time to load utils op: 0.00107574462890625 seconds 27: Time to load utils op: 0.0011017322540283203 secondsTime to load utils op: 0.0010101795196533203 seconds 20: Time to load utils op: 0.0010492801666259766 seconds 27: 6: Time to load utils op: 0.0016591548919677734 seconds 6: Time to load utils op: 0.0017375946044921875 seconds 20: Time to load utils op: 0.0010445117950439453 seconds 21: Time to load utils op: 0.0009748935699462891 seconds 27: Time to load utils op: 0.0010814666748046875 seconds 6: Time to load utils op: 0.0016438961029052734 seconds 6: Time to load utils op: 0.0016887187957763672 seconds 6: Time to load utils op: 0.0016834735870361328 seconds 6: Time to load utils op: 0.0016748905181884766 seconds 18: Time to load utils op: 0.0011594295501708984 seconds 26: Time to load utils op: 0.0013501644134521484 seconds 7: Time to load utils op: 0.0011644363403320312 seconds 26: Time to load utils op: 0.0012767314910888672 seconds 6: Time to load utils op: 0.00168609619140625 seconds 26: Time to load utils op: 0.0013184547424316406 seconds 26: Time to load utils op: 0.0013458728790283203 seconds 20: Time to load utils op: 0.0013551712036132812 seconds 20: Time to load utils op: 0.0012717247009277344 seconds 20: Time to load utils op: 0.0012431144714355469 seconds 7: Time to load utils op: 0.0012698173522949219 seconds 7: Time to load utils op: 0.001161813735961914 seconds 7: Time to load utils op: 0.001127481460571289 seconds 20: Time to load utils op: 0.0013189315795898438 seconds 18: Time to load utils op: 0.0014095306396484375 secondsTime to load utils op: 0.0013554096221923828 seconds 18: 20: Time to load utils op: 0.0013110637664794922 seconds 26: Time to load utils op: 0.0014090538024902344 seconds 7: Time to load utils op: 0.0012538433074951172 seconds 7: Time to load utils op: 0.0012896060943603516 seconds 18: Time to load utils op: 0.0013589859008789062 secondsTime to load utils op: 0.001398324966430664 secondsTime to load utils op: 0.0013840198516845703 seconds 18: 18: 7: Time to load utils op: 0.0011670589447021484 seconds 7: Time to load utils op: 0.0012137889862060547 seconds 18: Time to load utils op: 0.0013577938079833984 seconds 21: Time to load utils op: 0.0014269351959228516 seconds 21: Time to load utils op: 0.001348733901977539 seconds 21: Time to load utils op: 0.0014073848724365234 seconds 21: Time to load utils op: 0.0014641284942626953 seconds 21: Time to load utils op: 0.0015056133270263672 seconds 21: Time to load utils op: 0.0014579296112060547 seconds 21: Time to load utils op: 0.0014927387237548828 seconds 16: Time to load utils op: 0.0009679794311523438 seconds 30: Time to load utils op: 0.0008895397186279297 secondsTime to load utils op: 0.0007960796356201172 seconds 30: 16: Time to load utils op: 0.0012006759643554688 seconds 16: Time to load utils op: 0.0011951923370361328 seconds 16: Time to load utils op: 0.0011703968048095703 seconds 16: Time to load utils op: 0.0012259483337402344 secondsTime to load utils op: 0.0012247562408447266 seconds 16: 16: Time to load utils op: 0.0011959075927734375 seconds 16: Time to load utils op: 0.00121307373046875 seconds 28: Time to load utils op: 0.0008070468902587891 seconds 30: Time to load utils op: 0.0011947154998779297 seconds 30: Time to load utils op: 0.0011930465698242188 secondsTime to load utils op: 0.0011737346649169922 seconds 30: 30: Time to load utils op: 0.001222848892211914 secondsTime to load utils op: 0.001201629638671875 seconds 30: 30: Time to load utils op: 0.0012598037719726562 seconds 28: Time to load utils op: 0.0009381771087646484 seconds 28: Time to load utils op: 0.0009450912475585938 seconds 28: Time to load utils op: 0.0011951923370361328 seconds 28: Time to load utils op: 0.001127004623413086 seconds 28: Time to load utils op: 0.0011293888092041016 seconds 28: Time to load utils op: 0.001119852066040039 seconds 28: Time to load utils op: 0.001211404800415039 seconds 31: Time to load utils op: 0.0009756088256835938 seconds 31: Time to load utils op: 0.0014069080352783203 seconds 31: Time to load utils op: 0.0013911724090576172 seconds 31: Time to load utils op: 0.0013782978057861328 seconds 31: Time to load utils op: 0.0014371871948242188 seconds 31: Time to load utils op: 0.001409769058227539 seconds 31: Time to load utils op: 0.0014142990112304688 seconds 31: Time to load utils op: 0.001417398452758789 seconds 0: [2023-04-27 00:01:30,097] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 0: [2023-04-27 00:01:30,098] [INFO] [utils.py:828:see_memory_usage] MA 2.04 GB Max_MA 2.04 GB CA 2.19 GB Max_CA 2 GB 0: [2023-04-27 00:01:30,098] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,214] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 0: [2023-04-27 00:01:30,214] [INFO] [utils.py:828:see_memory_usage] MA 4.22 GB Max_MA 4.22 GB CA 5.44 GB Max_CA 5 GB 0: [2023-04-27 00:01:30,214] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,313] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 0: [2023-04-27 00:01:30,314] [INFO] [utils.py:828:see_memory_usage] MA 4.22 GB Max_MA 4.22 GB CA 5.44 GB Max_CA 5 GB 0: [2023-04-27 00:01:30,314] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,414] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 0: [2023-04-27 00:01:30,415] [INFO] [utils.py:828:see_memory_usage] MA 6.14 GB Max_MA 6.14 GB CA 8.31 GB Max_CA 8 GB 0: [2023-04-27 00:01:30,415] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,513] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 0: [2023-04-27 00:01:30,514] [INFO] [utils.py:828:see_memory_usage] MA 6.14 GB Max_MA 6.14 GB CA 8.31 GB Max_CA 8 GB 0: [2023-04-27 00:01:30,514] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,616] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 0: [2023-04-27 00:01:30,616] [INFO] [utils.py:828:see_memory_usage] MA 6.14 GB Max_MA 6.14 GB CA 8.31 GB Max_CA 8 GB 0: [2023-04-27 00:01:30,616] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,714] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer 0: [2023-04-27 00:01:30,714] [INFO] [utils.py:828:see_memory_usage] MA 6.14 GB Max_MA 6.14 GB CA 8.31 GB Max_CA 8 GB 0: [2023-04-27 00:01:30,714] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,818] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer 0: [2023-04-27 00:01:30,819] [INFO] [utils.py:828:see_memory_usage] MA 6.17 GB Max_MA 6.17 GB CA 8.31 GB Max_CA 8 GB 0: [2023-04-27 00:01:30,819] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,918] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer 0: [2023-04-27 00:01:30,918] [INFO] [utils.py:828:see_memory_usage] MA 6.17 GB Max_MA 6.17 GB CA 8.31 GB Max_CA 8 GB 0: [2023-04-27 00:01:30,918] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 7.9% 0: [2023-04-27 00:01:30,919] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam 0: [2023-04-27 00:01:30,919] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler 0: [2023-04-27 00:01:30,919] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = 0: [2023-04-27 00:01:30,919] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: [2023-04-27 00:01:30,919] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] activation_checkpointing_config { 0: "partition_activations": false, 0: "contiguous_memory_optimization": false, 0: "cpu_checkpointing": false, 0: "number_checkpoints": null, 0: "synchronize_checkpoint_boundary": false, 0: "profile": false 0: } 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] amp_enabled .................. False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] amp_params ................... False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] autotuning_config ............ { 0: "enabled": false, 0: "start_step": null, 0: "end_step": null, 0: "metric_path": null, 0: "arg_mappings": null, 0: "metric": "throughput", 0: "model_info": null, 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", 0: "overwrite": true, 0: "fast": true, 0: "start_profile_step": 3, 0: "end_profile_step": 5, 0: "tuner_type": "gridsearch", 0: "tuner_early_stopping": 5, 0: "tuner_num_trials": 50, 0: "model_info_path": null, 0: "mp_size": 1, 0: "max_train_batch_size": null, 0: "min_train_batch_size": 1, 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 0: "min_train_micro_batch_size_per_gpu": 1, 0: "num_tuning_micro_batch_sizes": 3 0: } 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] bfloat16_enabled ............. True 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] comms_config ................. 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] communication_data_type ...... None 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] curriculum_enabled ........... False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] curriculum_params ............ False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] dataloader_drop_last ......... False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] disable_allgather ............ False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] dump_state ................... False 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None 0: [2023-04-27 00:01:30,920] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] elasticity_enabled ........... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] flops_profiler_config ........ { 0: "enabled": false, 0: "profile_step": 1, 0: "module_depth": -1, 0: "top_modules": 1, 0: "detailed": true, 0: "output_file": null 0: } 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] fp16_auto_cast ............... None 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] fp16_enabled ................. False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] global_rank .................. 0 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] load_universal_checkpoint .... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] loss_scale ................... 1.0 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] memory_breakdown ............. False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] monitor_config ............... 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] nebula_config ................ { 0: "enabled": false, 0: "persistent_storage_path": null, 0: "persistent_time_interval": 100, 0: "num_of_version_in_retention": 2, 0: "enable_nebula_load": true, 0: "load_path": null 0: } 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] optimizer_name ............... None 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] optimizer_params ............. None 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] pld_enabled .................. False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] pld_params ................... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] prescale_gradients ........... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] scheduler_name ............... None 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] scheduler_params ............. None 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] sparse_attention ............. None 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] steps_per_print .............. 2000 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] train_batch_size ............. 256 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 1 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] use_node_local_storage ....... False 0: [2023-04-27 00:01:30,921] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False 0: [2023-04-27 00:01:30,922] [INFO] [config.py:1011:print] world_size ................... 256 0: [2023-04-27 00:01:30,922] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False 0: [2023-04-27 00:01:30,922] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False 0: [2023-04-27 00:01:30,922] [INFO] [config.py:1011:print] zero_enabled ................. False 0: [2023-04-27 00:01:30,922] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 0: [2023-04-27 00:01:30,922] [INFO] [config.py:996:print_user_config] json = { 0: "train_micro_batch_size_per_gpu": 1, 0: "train_batch_size": 256, 0: "gradient_clipping": 1.0, 0: "zero_optimization": { 0: "stage": 0 0: }, 0: "bf16": { 0: "enabled": true 0: }, 0: "steps_per_print": 2.000000e+03, 0: "wall_clock_breakdown": false 0: } 0: Time to load utils op: 0.00048351287841796875 seconds 0: [2023-04-27 00:01:30,922] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=1 0: [2023-04-27 00:01:30,935] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=33 [0, 33) STAGE_PARAMS=1096338432 (1096.338M) TOTAL_PARAMS=1096338432 (1096.338M) UNIQUE_PARAMS=1096338432 (1096.338M) 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 2: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 18: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 16: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 20: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 12: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 12: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 20: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 16: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 24: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 26: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 26: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:30,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 14: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 31: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 30: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 22: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 3: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 9: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 6: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 29: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 18: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 28: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 19: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 19: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 7: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt... 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 13: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 1: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 10: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 9: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 29: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/mp_rank_00_model_states.pt. 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:30,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:31,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 24: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:31,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:31,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:31,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:31,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:31,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 29: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:31,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 3: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:31,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:31,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:31,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:31,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 18: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 18: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 3: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 24: [2023-04-27 00:01:31,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:31,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:31,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 5: [2023-04-27 00:01:31,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 16: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 16: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 8: [2023-04-27 00:01:31,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 31: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 15: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 1: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 9: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 31: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:31,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:31,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 22: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 28: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 15: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:31,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:31,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:31,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 14: [2023-04-27 00:01:31,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 21: [2023-04-27 00:01:31,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 4: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 28: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 29: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 13: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 6: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 27: [2023-04-27 00:01:31,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 20: [2023-04-27 00:01:31,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:31,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 26: [2023-04-27 00:01:31,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 10: [2023-04-27 00:01:31,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 19: [2023-04-27 00:01:31,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:31,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 19: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 23: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 2: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:31,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:31,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:31,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 30: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 25: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 21: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 10: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 7: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:31,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:31,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 17: [2023-04-27 00:01:31,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:31,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 1: [2023-04-27 00:01:31,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 12: [2023-04-27 00:01:31,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt... 11: [2023-04-27 00:01:31,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:31,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:31,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:31,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:31,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,662] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 26: [2023-04-27 00:01:31,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 17: [2023-04-27 00:01:31,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:31,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 19: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:31,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:31,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 19: [2023-04-27 00:01:31,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:31,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:31,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 19: [2023-04-27 00:01:31,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 19: [2023-04-27 00:01:31,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:31,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:31,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 11: [2023-04-27 00:01:31,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:31,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 8: [2023-04-27 00:01:31,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:31,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 20: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 7: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 25: [2023-04-27 00:01:31,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 27: [2023-04-27 00:01:31,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:31,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 23: [2023-04-27 00:01:31,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 2: [2023-04-27 00:01:31,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 5: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 4: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 12: [2023-04-27 00:01:31,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 9: [2023-04-27 00:01:31,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 22: [2023-04-27 00:01:31,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:31,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:31,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:31,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 13: [2023-04-27 00:01:31,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 30: [2023-04-27 00:01:31,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 14: [2023-04-27 00:01:31,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_01-model_00-model_states.pt. 6: [2023-04-27 00:01:31,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:31,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:31,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:31,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:31,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:31,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:31,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:31,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:31,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:31,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:31,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:31,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 15: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 31: [2023-04-27 00:01:31,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 15: [2023-04-27 00:01:31,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:31,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:31,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:31,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:31,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:31,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:31,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:31,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:31,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:31,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 18: [2023-04-27 00:01:31,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:31,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:31,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:31,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:31,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 24: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 3: [2023-04-27 00:01:32,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 3: [2023-04-27 00:01:32,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 16: [2023-04-27 00:01:32,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 16: [2023-04-27 00:01:32,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:32,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 29: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 26: [2023-04-27 00:01:32,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 27: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 25: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 1: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 8: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 23: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 20: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 4: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 6: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 9: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 11: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:32,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 7: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 30: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 12: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 27: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 28: [2023-04-27 00:01:32,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:32,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 2: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 21: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 21: [2023-04-27 00:01:32,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 10: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 9: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 10: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 1: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 31: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 20: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 22: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 19: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 22: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 14: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 13: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 17: [2023-04-27 00:01:32,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt... 5: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 5: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 11: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-04-27 00:01:32,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 29: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 8: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 12: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 6: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 28: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 18: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 14: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 26: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 25: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 4: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 23: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 13: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 30: [2023-04-27 00:01:32,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 17: [2023-04-27 00:01:32,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 2: [2023-04-27 00:01:32,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 7: [2023-04-27 00:01:32,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 19: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_03-model_00-model_states.pt. 24: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 31: [2023-04-27 00:01:32,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 31: [2023-04-27 00:01:32,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 18: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 18: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 15: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 15: [2023-04-27 00:01:32,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 24: [2023-04-27 00:01:32,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 3: [2023-04-27 00:01:32,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 24: [2023-04-27 00:01:32,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 3: [2023-04-27 00:01:32,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 16: [2023-04-27 00:01:32,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 21: [2023-04-27 00:01:32,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 9: [2023-04-27 00:01:32,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 9: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 21: [2023-04-27 00:01:32,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 16: [2023-04-27 00:01:32,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-04-27 00:01:32,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-04-27 00:01:32,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 4: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 4: [2023-04-27 00:01:32,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 30: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 29: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 29: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 25: [2023-04-27 00:01:32,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 27: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 17: [2023-04-27 00:01:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 8: [2023-04-27 00:01:32,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 20: [2023-04-27 00:01:32,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 28: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 23: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 14: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 30: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 20: [2023-04-27 00:01:32,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 5: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 11: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 23: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 13: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 28: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 26: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 1: [2023-04-27 00:01:32,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 12: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 12: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 10: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 10: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 13: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 5: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 7: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 22: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 25: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 2: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 2: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 6: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 19: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt... 11: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 22: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 1: [2023-04-27 00:01:32,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 19: [2023-04-27 00:01:32,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 8: [2023-04-27 00:01:32,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 27: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 7: [2023-04-27 00:01:32,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 17: [2023-04-27 00:01:32,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 14: [2023-04-27 00:01:32,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 6: [2023-04-27 00:01:32,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_04-model_00-model_states.pt. 26: [2023-04-27 00:01:32,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 15: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 18: [2023-04-27 00:01:32,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 15: [2023-04-27 00:01:32,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 31: [2023-04-27 00:01:32,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 31: [2023-04-27 00:01:32,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 21: [2023-04-27 00:01:32,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 9: [2023-04-27 00:01:32,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 16: [2023-04-27 00:01:32,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 3: [2023-04-27 00:01:32,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 3: [2023-04-27 00:01:32,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 9: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 16: [2023-04-27 00:01:32,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 21: [2023-04-27 00:01:32,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-04-27 00:01:32,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:32,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 4: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 5: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 23: [2023-04-27 00:01:32,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 2: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 8: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 19: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 26: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 6: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 29: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 25: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 14: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 20: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 1: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 22: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 27: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 11: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,786] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 17: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 10: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:32,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt... 24: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:32,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:32,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 4: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 24: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:32,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 5: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 19: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 30: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 10: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 25: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 28: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 1: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 11: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 20: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 29: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:32,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 14: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 23: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 22: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:32,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 27: [2023-04-27 00:01:32,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 12: [2023-04-27 00:01:32,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 2: [2023-04-27 00:01:32,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:32,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 8: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 26: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:32,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 6: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 7: [2023-04-27 00:01:32,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 13: [2023-04-27 00:01:32,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:32,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:32,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:32,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:32,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:32,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:32,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:32,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:32,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:32,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:32,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:32,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:32,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:32,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:32,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:32,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:32,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 17: [2023-04-27 00:01:32,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:32,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:32,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 24: [2023-04-27 00:01:32,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:32,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:32,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:32,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:32,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:32,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_05-model_00-model_states.pt. 18: [2023-04-27 00:01:32,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:32,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:32,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:32,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:32,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:32,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:32,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:32,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:32,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:32,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:32,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:32,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:32,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:32,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:32,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:32,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-04-27 00:01:32,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:32,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 31: [2023-04-27 00:01:32,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:32,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:32,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 21: [2023-04-27 00:01:32,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 15: [2023-04-27 00:01:32,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:32,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:32,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:32,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:32,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:32,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:32,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-04-27 00:01:32,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:32,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:32,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:32,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:32,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:32,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 20: [2023-04-27 00:01:33,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:33,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:33,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:33,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:33,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 18: [2023-04-27 00:01:33,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:33,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:33,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 9: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 18: [2023-04-27 00:01:33,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 4: [2023-04-27 00:01:33,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:33,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 9: [2023-04-27 00:01:33,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:33,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 24: [2023-04-27 00:01:33,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:33,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 24: [2023-04-27 00:01:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:33,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 21: [2023-04-27 00:01:33,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,154] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 15: [2023-04-27 00:01:33,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:33,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:33,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 3: [2023-04-27 00:01:33,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 3: [2023-04-27 00:01:33,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 3: [2023-04-27 00:01:33,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:33,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 31: [2023-04-27 00:01:33,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 26: [2023-04-27 00:01:33,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:33,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 20: [2023-04-27 00:01:33,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:33,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:33,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:33,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 29: [2023-04-27 00:01:33,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 16: [2023-04-27 00:01:33,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:33,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 11: [2023-04-27 00:01:33,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 10: [2023-04-27 00:01:33,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 16: [2023-04-27 00:01:33,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:33,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 28: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 28: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 13: [2023-04-27 00:01:33,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 4: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 26: [2023-04-27 00:01:33,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 13: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 4: [2023-04-27 00:01:33,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 29: [2023-04-27 00:01:33,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 4: [2023-04-27 00:01:33,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 12: [2023-04-27 00:01:33,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 11: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 10: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:33,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 12: [2023-04-27 00:01:33,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 30: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 6: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 25: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 7: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 19: [2023-04-27 00:01:33,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 30: [2023-04-27 00:01:33,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 8: [2023-04-27 00:01:33,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 19: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 8: [2023-04-27 00:01:33,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 1: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 14: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 17: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 2: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 25: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 22: [2023-04-27 00:01:33,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:33,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:33,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:33,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 2: [2023-04-27 00:01:33,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 7: [2023-04-27 00:01:33,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:33,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 23: [2023-04-27 00:01:33,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt... 27: [2023-04-27 00:01:33,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 5: [2023-04-27 00:01:33,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 6: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 14: [2023-04-27 00:01:33,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 1: [2023-04-27 00:01:33,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 22: [2023-04-27 00:01:33,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 27: [2023-04-27 00:01:33,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 23: [2023-04-27 00:01:33,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_06-model_00-model_states.pt. 17: [2023-04-27 00:01:33,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 21: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 21: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 12: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 10: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 18: [2023-04-27 00:01:33,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 13: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 16: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 29: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 26: [2023-04-27 00:01:33,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,450] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 26: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 3: [2023-04-27 00:01:33,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 3: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:33,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 31: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,458] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 16: [2023-04-27 00:01:33,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 3: [2023-04-27 00:01:33,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 13: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 10: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 30: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 30: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 18: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 15: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 19: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 24: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 18: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:33,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 28: [2023-04-27 00:01:33,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 15: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 15: [2023-04-27 00:01:33,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 19: [2023-04-27 00:01:33,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 28: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 5: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 9: [2023-04-27 00:01:33,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 18: [2023-04-27 00:01:33,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:33,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 24: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 24: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 27: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 5: [2023-04-27 00:01:33,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 20: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 20: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 31: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 5: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 2: [2023-04-27 00:01:33,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 31: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:33,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 5: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 9: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 0: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 9: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 1: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 19: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 8: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 25: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 27: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 28: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 31: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:33,501] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 20: [2023-04-27 00:01:33,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 4: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 27: [2023-04-27 00:01:33,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 7: [2023-04-27 00:01:33,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 19: [2023-04-27 00:01:33,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 28: [2023-04-27 00:01:33,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 25: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 20: [2023-04-27 00:01:33,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,512] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 15: [2023-04-27 00:01:33,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:33,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 6: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 22: [2023-04-27 00:01:33,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 17: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 11: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 23: [2023-04-27 00:01:33,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt... 14: [2023-04-27 00:01:33,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 27: [2023-04-27 00:01:33,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 14: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 2: [2023-04-27 00:01:33,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 11: [2023-04-27 00:01:33,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 1: [2023-04-27 00:01:33,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 22: [2023-04-27 00:01:33,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 7: [2023-04-27 00:01:33,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 23: [2023-04-27 00:01:33,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 17: [2023-04-27 00:01:33,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 6: [2023-04-27 00:01:33,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 4: [2023-04-27 00:01:33,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 29: [2023-04-27 00:01:33,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 26: [2023-04-27 00:01:33,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_07-model_00-model_states.pt. 12: [2023-04-27 00:01:33,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 12: [2023-04-27 00:01:33,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 12: [2023-04-27 00:01:33,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:33,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 29: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 26: [2023-04-27 00:01:33,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 8: [2023-04-27 00:01:33,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 8: [2023-04-27 00:01:33,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 30: [2023-04-27 00:01:33,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 30: [2023-04-27 00:01:33,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 25: [2023-04-27 00:01:33,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:33,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 10: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 25: [2023-04-27 00:01:33,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 13: [2023-04-27 00:01:33,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 10: [2023-04-27 00:01:33,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 7: [2023-04-27 00:01:33,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:33,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:33,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:33,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 13: [2023-04-27 00:01:33,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 2: [2023-04-27 00:01:33,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 7: [2023-04-27 00:01:33,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:33,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:33,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:33,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 1: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:33,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 23: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 11: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 17: [2023-04-27 00:01:33,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 17: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 16: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 16: [2023-04-27 00:01:33,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:33,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 22: [2023-04-27 00:01:33,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 6: [2023-04-27 00:01:33,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 14: [2023-04-27 00:01:33,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt... 21: [2023-04-27 00:01:33,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:33,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:33,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:33,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:33,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:33,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 11: [2023-04-27 00:01:33,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:33,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:33,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:33,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:33,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 21: [2023-04-27 00:01:33,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:33,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:33,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 23: [2023-04-27 00:01:33,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-04-27 00:01:33,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:33,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:33,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:33,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:33,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 14: [2023-04-27 00:01:33,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 6: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 22: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 1: [2023-04-27 00:01:33,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_08-model_00-model_states.pt. 9: [2023-04-27 00:01:33,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:33,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 31: [2023-04-27 00:01:33,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 3: [2023-04-27 00:01:33,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:33,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:33,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,812] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:33,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:33,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:33,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:33,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:33,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:33,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:33,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:33,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:33,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:33,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:33,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:33,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:33,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 8: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 7: [2023-04-27 00:01:33,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:33,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:33,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:33,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 8: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 4: [2023-04-27 00:01:33,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 5: [2023-04-27 00:01:33,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 0: [2023-04-27 00:01:33,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 28: [2023-04-27 00:01:33,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 24: [2023-04-27 00:01:33,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:33,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:33,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:33,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:33,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:33,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:33,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:33,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:33,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:33,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:33,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 6: [2023-04-27 00:01:33,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:33,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:33,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:33,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 16: [2023-04-27 00:01:33,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:33,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 21: [2023-04-27 00:01:33,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:33,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:33,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:33,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:33,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 28: [2023-04-27 00:01:33,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 7: [2023-04-27 00:01:33,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:33,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:33,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:33,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:33,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:33,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:33,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:33,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:33,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:33,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 4: [2023-04-27 00:01:33,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:33,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:33,952] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:33,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:33,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:33,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:33,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:33,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:33,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:33,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:33,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:33,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:33,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:33,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:33,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:33,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:33,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 21: [2023-04-27 00:01:33,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:33,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 0: [2023-04-27 00:01:33,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:33,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:33,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:33,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:33,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:33,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:33,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:33,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:33,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:33,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:33,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:33,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 16: [2023-04-27 00:01:33,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:33,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:33,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 26: [2023-04-27 00:01:33,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 14: [2023-04-27 00:01:33,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 14: [2023-04-27 00:01:33,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:33,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:33,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 26: [2023-04-27 00:01:33,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:33,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:33,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:34,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:34,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:34,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 29: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 22: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 12: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:34,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 13: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 2: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 18: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 29: [2023-04-27 00:01:34,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 9: [2023-04-27 00:01:34,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:34,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 11: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 25: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 27: [2023-04-27 00:01:34,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 23: [2023-04-27 00:01:34,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:34,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 20: [2023-04-27 00:01:34,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 9: [2023-04-27 00:01:34,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 15: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 10: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 22: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 20: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 11: [2023-04-27 00:01:34,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 12: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 10: [2023-04-27 00:01:34,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 15: [2023-04-27 00:01:34,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 25: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 3: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 13: [2023-04-27 00:01:34,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 23: [2023-04-27 00:01:34,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 17: [2023-04-27 00:01:34,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 18: [2023-04-27 00:01:34,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 1: [2023-04-27 00:01:34,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:34,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:34,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 2: [2023-04-27 00:01:34,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:34,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 19: [2023-04-27 00:01:34,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:34,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:34,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 31: [2023-04-27 00:01:34,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:34,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 1: [2023-04-27 00:01:34,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:34,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 19: [2023-04-27 00:01:34,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 30: [2023-04-27 00:01:34,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt... 17: [2023-04-27 00:01:34,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 27: [2023-04-27 00:01:34,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 4: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:34,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 30: [2023-04-27 00:01:34,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:34,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 24: [2023-04-27 00:01:34,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:34,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 6: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:34,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 8: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 5: [2023-04-27 00:01:34,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_09-model_00-model_states.pt. 6: [2023-04-27 00:01:34,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 28: [2023-04-27 00:01:34,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 5: [2023-04-27 00:01:34,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 28: [2023-04-27 00:01:34,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 4: [2023-04-27 00:01:34,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 6: [2023-04-27 00:01:34,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 8: [2023-04-27 00:01:34,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 7: [2023-04-27 00:01:34,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:34,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:34,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 7: [2023-04-27 00:01:34,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:34,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:34,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 29: [2023-04-27 00:01:34,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 29: [2023-04-27 00:01:34,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 0: [2023-04-27 00:01:34,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 0: [2023-04-27 00:01:34,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 0: [2023-04-27 00:01:34,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 14: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 9: [2023-04-27 00:01:34,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 23: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 13: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 21: [2023-04-27 00:01:34,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 14: [2023-04-27 00:01:34,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 21: [2023-04-27 00:01:34,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 1: [2023-04-27 00:01:34,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 23: [2023-04-27 00:01:34,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 22: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 12: [2023-04-27 00:01:34,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 11: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 20: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 30: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 3: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 1: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 27: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 15: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 10: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 26: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 26: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 16: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:34,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 22: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 17: [2023-04-27 00:01:34,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 16: [2023-04-27 00:01:34,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 9: [2023-04-27 00:01:34,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 3: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 15: [2023-04-27 00:01:34,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 2: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 11: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 30: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 10: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 12: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 24: [2023-04-27 00:01:34,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 13: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 18: [2023-04-27 00:01:34,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 24: [2023-04-27 00:01:34,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 19: [2023-04-27 00:01:34,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt... 31: [2023-04-27 00:01:34,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 31: [2023-04-27 00:01:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 17: [2023-04-27 00:01:34,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 18: [2023-04-27 00:01:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 20: [2023-04-27 00:01:34,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 31: [2023-04-27 00:01:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 27: [2023-04-27 00:01:34,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 31: [2023-04-27 00:01:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 25: [2023-04-27 00:01:34,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 19: [2023-04-27 00:01:34,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 5: [2023-04-27 00:01:34,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 19: [2023-04-27 00:01:34,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_10-model_00-model_states.pt. 5: [2023-04-27 00:01:34,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 28: [2023-04-27 00:01:34,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,371] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 28: [2023-04-27 00:01:34,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 5: [2023-04-27 00:01:34,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 8: [2023-04-27 00:01:34,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 8: [2023-04-27 00:01:34,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 29: [2023-04-27 00:01:34,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 6: [2023-04-27 00:01:34,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 29: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 6: [2023-04-27 00:01:34,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,473] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,479] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 4: [2023-04-27 00:01:34,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 2: [2023-04-27 00:01:34,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 4: [2023-04-27 00:01:34,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,497] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,498] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,499] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 11: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 2: [2023-04-27 00:01:34,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 0: [2023-04-27 00:01:34,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 11: [2023-04-27 00:01:34,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 15: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 23: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 24: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 1: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 27: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 1: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 13: [2023-04-27 00:01:34,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 17: [2023-04-27 00:01:34,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 16: [2023-04-27 00:01:34,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 13: [2023-04-27 00:01:34,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 22: [2023-04-27 00:01:34,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 21: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 7: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 30: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 14: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 26: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 9: [2023-04-27 00:01:34,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 18: [2023-04-27 00:01:34,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 20: [2023-04-27 00:01:34,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 21: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 20: [2023-04-27 00:01:34,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 12: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 16: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 30: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 24: [2023-04-27 00:01:34,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 9: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 10: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt... 25: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 22: [2023-04-27 00:01:34,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 25: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 23: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 18: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 15: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 26: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 12: [2023-04-27 00:01:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 27: [2023-04-27 00:01:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 7: [2023-04-27 00:01:34,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 19: [2023-04-27 00:01:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 3: [2023-04-27 00:01:34,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 14: [2023-04-27 00:01:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 10: [2023-04-27 00:01:34,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_11-model_00-model_states.pt. 17: [2023-04-27 00:01:34,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 5: [2023-04-27 00:01:34,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 6: [2023-04-27 00:01:34,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,658] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,661] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 4: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 11: [2023-04-27 00:01:34,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,686] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 31: [2023-04-27 00:01:34,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 28: [2023-04-27 00:01:34,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 31: [2023-04-27 00:01:34,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 28: [2023-04-27 00:01:34,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 1: [2023-04-27 00:01:34,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 8: [2023-04-27 00:01:34,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 13: [2023-04-27 00:01:34,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 8: [2023-04-27 00:01:34,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 1: [2023-04-27 00:01:34,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,756] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 22: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 0: [2023-04-27 00:01:34,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:34,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 16: [2023-04-27 00:01:34,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:34,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:34,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:34,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,772] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 15: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 0: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 2: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 26: [2023-04-27 00:01:34,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,777] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:34,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:34,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:34,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 23: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 24: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:34,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 24: [2023-04-27 00:01:34,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 0: [2023-04-27 00:01:34,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,785] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 18: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 24: [2023-04-27 00:01:34,788] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 16: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 16: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,790] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 30: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 21: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 13: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 27: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,791] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 14: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 18: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 7: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 29: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:34,793] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 17: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,794] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 0: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:34,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 20: [2023-04-27 00:01:34,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 9: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 25: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 19: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 22: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 16: [2023-04-27 00:01:34,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:34,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:34,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 24: [2023-04-27 00:01:34,800] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 3: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 9: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt... 10: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 15: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 21: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 9: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 20: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,807] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 19: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 19: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,808] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 3: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 9: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 2: [2023-04-27 00:01:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,810] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 21: [2023-04-27 00:01:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 26: [2023-04-27 00:01:34,813] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 21: [2023-04-27 00:01:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:34,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 19: [2023-04-27 00:01:34,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:34,815] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 3: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 29: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 14: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:34,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:34,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 30: [2023-04-27 00:01:34,821] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 23: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 27: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 6: [2023-04-27 00:01:34,822] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 7: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 12: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 4: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:34,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:34,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:34,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 10: [2023-04-27 00:01:34,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:34,837] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 25: [2023-04-27 00:01:34,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 29: [2023-04-27 00:01:34,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:34,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 6: [2023-04-27 00:01:34,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 29: [2023-04-27 00:01:34,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:34,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:34,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:34,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 5: [2023-04-27 00:01:34,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:34,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:34,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:34,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 17: [2023-04-27 00:01:34,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:34,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:34,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 6: [2023-04-27 00:01:34,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:34,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_12-model_00-model_states.pt. 11: [2023-04-27 00:01:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 5: [2023-04-27 00:01:34,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:34,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 5: [2023-04-27 00:01:34,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 4: [2023-04-27 00:01:34,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:34,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,867] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 11: [2023-04-27 00:01:34,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 11: [2023-04-27 00:01:34,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 4: [2023-04-27 00:01:34,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,871] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:34,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:34,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:34,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 31: [2023-04-27 00:01:34,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:34,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:34,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:34,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:34,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:34,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:34,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:34,903] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 28: [2023-04-27 00:01:34,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 1: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 28: [2023-04-27 00:01:34,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 31: [2023-04-27 00:01:34,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:34,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:34,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,940] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:34,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 8: [2023-04-27 00:01:34,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:34,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 13: [2023-04-27 00:01:34,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:34,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 8: [2023-04-27 00:01:34,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 1: [2023-04-27 00:01:34,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:34,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 22: [2023-04-27 00:01:34,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 20: [2023-04-27 00:01:34,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:34,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:34,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 22: [2023-04-27 00:01:34,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 13: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 20: [2023-04-27 00:01:34,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:34,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:34,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:34,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:34,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:34,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:34,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:34,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:34,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:34,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:34,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:34,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:34,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 15: [2023-04-27 00:01:34,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:34,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 2: [2023-04-27 00:01:34,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:34,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:34,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 26: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 7: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:35,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 15: [2023-04-27 00:01:35,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 14: [2023-04-27 00:01:35,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:35,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 10: [2023-04-27 00:01:35,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 23: [2023-04-27 00:01:35,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 18: [2023-04-27 00:01:35,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 17: [2023-04-27 00:01:35,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:35,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 18: [2023-04-27 00:01:35,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:35,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:35,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:35,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:35,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:35,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:35,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:35,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:35,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:35,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:35,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:35,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:35,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 30: [2023-04-27 00:01:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 27: [2023-04-27 00:01:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:35,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 25: [2023-04-27 00:01:35,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt... 12: [2023-04-27 00:01:35,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:35,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 23: [2023-04-27 00:01:35,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:35,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 12: [2023-04-27 00:01:35,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 7: [2023-04-27 00:01:35,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 27: [2023-04-27 00:01:35,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 2: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 30: [2023-04-27 00:01:35,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 14: [2023-04-27 00:01:35,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 10: [2023-04-27 00:01:35,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 26: [2023-04-27 00:01:35,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 17: [2023-04-27 00:01:35,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_13-model_00-model_states.pt. 25: [2023-04-27 00:01:35,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 21: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 21: [2023-04-27 00:01:35,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 9: [2023-04-27 00:01:35,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 29: [2023-04-27 00:01:35,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 6: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:35,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 9: [2023-04-27 00:01:35,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 31: [2023-04-27 00:01:35,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 6: [2023-04-27 00:01:35,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 29: [2023-04-27 00:01:35,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 16: [2023-04-27 00:01:35,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 16: [2023-04-27 00:01:35,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,206] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 3: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 28: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 1: [2023-04-27 00:01:35,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,215] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 1: [2023-04-27 00:01:35,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,216] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 31: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 13: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 8: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 11: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 20: [2023-04-27 00:01:35,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 11: [2023-04-27 00:01:35,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,229] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 20: [2023-04-27 00:01:35,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 22: [2023-04-27 00:01:35,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 15: [2023-04-27 00:01:35,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 8: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 4: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 5: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,241] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 22: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 13: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 4: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 24: [2023-04-27 00:01:35,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 23: [2023-04-27 00:01:35,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,246] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 22: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 22: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 28: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 24: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 15: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 22: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 0: [2023-04-27 00:01:35,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 3: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 19: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 19: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 27: [2023-04-27 00:01:35,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 30: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 23: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 26: [2023-04-27 00:01:35,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 5: [2023-04-27 00:01:35,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,263] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,265] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 12: [2023-04-27 00:01:35,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 10: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 14: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 2: [2023-04-27 00:01:35,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 17: [2023-04-27 00:01:35,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 0: [2023-04-27 00:01:35,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 25: [2023-04-27 00:01:35,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt... 18: [2023-04-27 00:01:35,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 12: [2023-04-27 00:01:35,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 27: [2023-04-27 00:01:35,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 25: [2023-04-27 00:01:35,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 26: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 2: [2023-04-27 00:01:35,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 30: [2023-04-27 00:01:35,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 18: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 10: [2023-04-27 00:01:35,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,298] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 7: [2023-04-27 00:01:35,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 17: [2023-04-27 00:01:35,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_14-model_00-model_states.pt. 14: [2023-04-27 00:01:35,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 9: [2023-04-27 00:01:35,456] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,459] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 9: [2023-04-27 00:01:35,496] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 1: [2023-04-27 00:01:35,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 25: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,536] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 1: [2023-04-27 00:01:35,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 24: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 15: [2023-04-27 00:01:35,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 1: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 5: [2023-04-27 00:01:35,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 30: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 27: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 7: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 19: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 26: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 23: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 3: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 15: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 16: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 8: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 6: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 20: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 31: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 16: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 13: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 22: [2023-04-27 00:01:35,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 12: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 4: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 18: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 22: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 0: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 11: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 17: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 29: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 28: [2023-04-27 00:01:35,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt... 21: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 3: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:35,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 4: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 12: [2023-04-27 00:01:35,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 31: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 28: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 29: [2023-04-27 00:01:35,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 25: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 0: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 5: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 11: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:35,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 8: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 21: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 19: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 26: [2023-04-27 00:01:35,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 23: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 24: [2023-04-27 00:01:35,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 30: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,597] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 13: [2023-04-27 00:01:35,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:35,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:35,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 6: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 2: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 17: [2023-04-27 00:01:35,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 18: [2023-04-27 00:01:35,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 10: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 27: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 22: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 7: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 14: [2023-04-27 00:01:35,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:35,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:35,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_15-model_00-model_states.pt. 20: [2023-04-27 00:01:35,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:35,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:35,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:35,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:35,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:35,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:35,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:35,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:35,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:35,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,779] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,779] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,795] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:35,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,805] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:35,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:35,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:35,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:35,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:35,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,811] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,824] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 16: [2023-04-27 00:01:35,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,832] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,838] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:35,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 15: [2023-04-27 00:01:35,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,848] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 19: [2023-04-27 00:01:35,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:35,848] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:35,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:35,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:35,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:35,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 1: [2023-04-27 00:01:35,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 3: [2023-04-27 00:01:35,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 9: [2023-04-27 00:01:35,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,866] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,866] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:35,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,871] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:35,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 9: [2023-04-27 00:01:35,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:35,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:35,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:35,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,948] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 24: [2023-04-27 00:01:35,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 11: [2023-04-27 00:01:35,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:35,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 21: [2023-04-27 00:01:35,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:35,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:35,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:35,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 24: [2023-04-27 00:01:35,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:35,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:35,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:35,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:35,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 11: [2023-04-27 00:01:35,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 21: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:35,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:35,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:35,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:35,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:35,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:35,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:35,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 28: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:35,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:35,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 28: [2023-04-27 00:01:35,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:35,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:35,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:36,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:36,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,011] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:36,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:36,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:36,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:36,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:36,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:36,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 12: [2023-04-27 00:01:36,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:36,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 13: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 10: [2023-04-27 00:01:36,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 18: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 14: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 12: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 30: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 31: [2023-04-27 00:01:36,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 2: [2023-04-27 00:01:36,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:36,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 13: [2023-04-27 00:01:36,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 30: [2023-04-27 00:01:36,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,044] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 2: [2023-04-27 00:01:36,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 31: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,049] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 25: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 23: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 20: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 2: [2023-04-27 00:01:36,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 2: [2023-04-27 00:01:36,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 17: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:36,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 27: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 16: [2023-04-27 00:01:36,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 25: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 22: [2023-04-27 00:01:36,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:36,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 14: [2023-04-27 00:01:36,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 0: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 0: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 23: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 18: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 22: [2023-04-27 00:01:36,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 5: [2023-04-27 00:01:36,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 26: [2023-04-27 00:01:36,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 4: [2023-04-27 00:01:36,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:36,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 20: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 8: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 17: [2023-04-27 00:01:36,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 10: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 26: [2023-04-27 00:01:36,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:36,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 6: [2023-04-27 00:01:36,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 7: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:36,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 29: [2023-04-27 00:01:36,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 7: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 5: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt... 27: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:36,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:36,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 6: [2023-04-27 00:01:36,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:36,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 19: [2023-04-27 00:01:36,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 4: [2023-04-27 00:01:36,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 8: [2023-04-27 00:01:36,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 3: [2023-04-27 00:01:36,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 29: [2023-04-27 00:01:36,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 19: [2023-04-27 00:01:36,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_16-model_00-model_states.pt. 1: [2023-04-27 00:01:36,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,153] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,156] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,164] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,179] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 11: [2023-04-27 00:01:36,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:36,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 28: [2023-04-27 00:01:36,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 12: [2023-04-27 00:01:36,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,245] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,247] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,248] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 28: [2023-04-27 00:01:36,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 22: [2023-04-27 00:01:36,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 30: [2023-04-27 00:01:36,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 17: [2023-04-27 00:01:36,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 29: [2023-04-27 00:01:36,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 13: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 4: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 9: [2023-04-27 00:01:36,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 2: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 18: [2023-04-27 00:01:36,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 29: [2023-04-27 00:01:36,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 18: [2023-04-27 00:01:36,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 4: [2023-04-27 00:01:36,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 4: [2023-04-27 00:01:36,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 22: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 30: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 2: [2023-04-27 00:01:36,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 17: [2023-04-27 00:01:36,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 13: [2023-04-27 00:01:36,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 16: [2023-04-27 00:01:36,370] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 7: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 9: [2023-04-27 00:01:36,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 31: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 3: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,380] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 2: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 1: [2023-04-27 00:01:36,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 1: [2023-04-27 00:01:36,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 2: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 3: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 11: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 3: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 3: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 16: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 15: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 14: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 24: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 7: [2023-04-27 00:01:36,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 1: [2023-04-27 00:01:36,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 16: [2023-04-27 00:01:36,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 16: [2023-04-27 00:01:36,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 15: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 21: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 24: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 20: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 21: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 24: [2023-04-27 00:01:36,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,408] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 25: [2023-04-27 00:01:36,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 26: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 23: [2023-04-27 00:01:36,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 10: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 5: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 15: [2023-04-27 00:01:36,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 25: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 1: [2023-04-27 00:01:36,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 0: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 6: [2023-04-27 00:01:36,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 21: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 8: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 0: [2023-04-27 00:01:36,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 24: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 31: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,417] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt... 31: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 31: [2023-04-27 00:01:36,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 23: [2023-04-27 00:01:36,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,430] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,432] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 5: [2023-04-27 00:01:36,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 14: [2023-04-27 00:01:36,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 10: [2023-04-27 00:01:36,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 19: [2023-04-27 00:01:36,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 6: [2023-04-27 00:01:36,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 26: [2023-04-27 00:01:36,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 20: [2023-04-27 00:01:36,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,451] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 27: [2023-04-27 00:01:36,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 19: [2023-04-27 00:01:36,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,460] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,461] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,462] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 12: [2023-04-27 00:01:36,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 12: [2023-04-27 00:01:36,466] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 19: [2023-04-27 00:01:36,468] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 12: [2023-04-27 00:01:36,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_17-model_00-model_states.pt. 8: [2023-04-27 00:01:36,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,484] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,485] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,502] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 28: [2023-04-27 00:01:36,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,514] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 28: [2023-04-27 00:01:36,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,550] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 17: [2023-04-27 00:01:36,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,557] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,559] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 30: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 17: [2023-04-27 00:01:36,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,574] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,576] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,577] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,578] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,578] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 30: [2023-04-27 00:01:36,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 29: [2023-04-27 00:01:36,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 29: [2023-04-27 00:01:36,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 18: [2023-04-27 00:01:36,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,605] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 8: [2023-04-27 00:01:36,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,609] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 25: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 18: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 13: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 0: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 7: [2023-04-27 00:01:36,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 9: [2023-04-27 00:01:36,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 2: [2023-04-27 00:01:36,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 20: [2023-04-27 00:01:36,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 27: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 5: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 14: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,625] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 5: [2023-04-27 00:01:36,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 14: [2023-04-27 00:01:36,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 7: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 26: [2023-04-27 00:01:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,641] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 22: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 9: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 10: [2023-04-27 00:01:36,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,645] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 26: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,646] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 2: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 23: [2023-04-27 00:01:36,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,653] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,654] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,655] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 25: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 8: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 22: [2023-04-27 00:01:36,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 23: [2023-04-27 00:01:36,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt... 6: [2023-04-27 00:01:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,648] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,651] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,656] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 4: [2023-04-27 00:01:36,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 13: [2023-04-27 00:01:36,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 11: [2023-04-27 00:01:36,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:36,665] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,667] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 20: [2023-04-27 00:01:36,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 4: [2023-04-27 00:01:36,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,669] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 27: [2023-04-27 00:01:36,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 0: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 10: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 21: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,679] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,680] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,685] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,687] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_18-model_00-model_states.pt. 6: [2023-04-27 00:01:36,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 1: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 16: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 3: [2023-04-27 00:01:36,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 21: [2023-04-27 00:01:36,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 3: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 12: [2023-04-27 00:01:36,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 28: [2023-04-27 00:01:36,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:36,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 24: [2023-04-27 00:01:36,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 28: [2023-04-27 00:01:36,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 19: [2023-04-27 00:01:36,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 19: [2023-04-27 00:01:36,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 24: [2023-04-27 00:01:36,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,781] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:36,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:36,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 30: [2023-04-27 00:01:36,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,798] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 30: [2023-04-27 00:01:36,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 29: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 18: [2023-04-27 00:01:36,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,817] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,821] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 18: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 29: [2023-04-27 00:01:36,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,830] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,831] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,833] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:36,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,836] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:36,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:36,839] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 31: [2023-04-27 00:01:36,840] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:36,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 31: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,844] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,846] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:36,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:36,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:36,854] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:36,855] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:36,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:36,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:36,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:36,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:36,868] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:36,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,868] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:36,870] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,875] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:36,878] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 9: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 9: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 22: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 23: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,882] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,883] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 5: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 22: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 6: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,886] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 13: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,887] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 17: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 7: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 0: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 25: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 14: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 10: [2023-04-27 00:01:36,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 15: [2023-04-27 00:01:36,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:36,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:36,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 8: [2023-04-27 00:01:36,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 2: [2023-04-27 00:01:36,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:36,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt... 20: [2023-04-27 00:01:36,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,896] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:36,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,899] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:36,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:36,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:36,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,903] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 7: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 5: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,904] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,906] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:36,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,909] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 11: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:36,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 13: [2023-04-27 00:01:36,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,915] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:36,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 4: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 14: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 15: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:36,917] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 0: [2023-04-27 00:01:36,918] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,918] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,920] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 23: [2023-04-27 00:01:36,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 2: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:36,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 26: [2023-04-27 00:01:36,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:36,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:36,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 10: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 6: [2023-04-27 00:01:36,924] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 20: [2023-04-27 00:01:36,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:36,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:36,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:36,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:36,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 27: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:36,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:36,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 1: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:36,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:36,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 1: [2023-04-27 00:01:36,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:36,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:36,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:36,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:36,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:36,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:36,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:36,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:36,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 21: [2023-04-27 00:01:36,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:36,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:36,938] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:36,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:36,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:36,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,941] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:36,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:36,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:36,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 17: [2023-04-27 00:01:36,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:36,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:36,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,944] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 21: [2023-04-27 00:01:36,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:36,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:36,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:36,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 8: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 25: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:36,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 21: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_19-model_00-model_states.pt. 12: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:36,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:36,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:36,954] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:36,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:36,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:36,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:36,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:36,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:36,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:36,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:36,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:36,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:36,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,990] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:36,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:36,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:36,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:36,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:36,999] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 24: [2023-04-27 00:01:37,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,040] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,048] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 19: [2023-04-27 00:01:37,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:37,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:37,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,160] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,169] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:37,170] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:37,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 18: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:37,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:37,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 10: [2023-04-27 00:01:37,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 9: [2023-04-27 00:01:37,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 22: [2023-04-27 00:01:37,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 9: [2023-04-27 00:01:37,179] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,180] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,180] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,181] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 9: [2023-04-27 00:01:37,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:37,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,191] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,195] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 22: [2023-04-27 00:01:37,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:37,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,205] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,207] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:37,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 31: [2023-04-27 00:01:37,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 15: [2023-04-27 00:01:37,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 15: [2023-04-27 00:01:37,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:37,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:37,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:37,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,231] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,231] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,232] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:37,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:37,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:37,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,239] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 25: [2023-04-27 00:01:37,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 4: [2023-04-27 00:01:37,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 4: [2023-04-27 00:01:37,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,246] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 31: [2023-04-27 00:01:37,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 25: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 4: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:37,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,257] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:37,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,259] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 24: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 1: [2023-04-27 00:01:37,269] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 7: [2023-04-27 00:01:37,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:37,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,275] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,277] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 12: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 7: [2023-04-27 00:01:37,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,281] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:37,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 12: [2023-04-27 00:01:37,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 21: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 29: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 3: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,294] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 24: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,295] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,296] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:37,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 30: [2023-04-27 00:01:37,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 11: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 16: [2023-04-27 00:01:37,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 11: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 11: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:37,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 30: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 30: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 16: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 14: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 17: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 24: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 28: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 28: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 20: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 3: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 11: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 26: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 29: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 23: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 8: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 13: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 5: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 0: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,314] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 2: [2023-04-27 00:01:37,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 16: [2023-04-27 00:01:37,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 27: [2023-04-27 00:01:37,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 6: [2023-04-27 00:01:37,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 13: [2023-04-27 00:01:37,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt... 29: [2023-04-27 00:01:37,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 3: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,321] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 3: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 16: [2023-04-27 00:01:37,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 28: [2023-04-27 00:01:37,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 29: [2023-04-27 00:01:37,328] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,329] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 30: [2023-04-27 00:01:37,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 6: [2023-04-27 00:01:37,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 28: [2023-04-27 00:01:37,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 20: [2023-04-27 00:01:37,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 2: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 5: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 27: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 23: [2023-04-27 00:01:37,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 8: [2023-04-27 00:01:37,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,358] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 0: [2023-04-27 00:01:37,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 26: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 17: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 31: [2023-04-27 00:01:37,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 22: [2023-04-27 00:01:37,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 22: [2023-04-27 00:01:37,357] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 10: [2023-04-27 00:01:37,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 18: [2023-04-27 00:01:37,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_20-model_00-model_states.pt. 14: [2023-04-27 00:01:37,373] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,374] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 18: [2023-04-27 00:01:37,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 19: [2023-04-27 00:01:37,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 10: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 18: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 25: [2023-04-27 00:01:37,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 10: [2023-04-27 00:01:37,386] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,387] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,390] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 19: [2023-04-27 00:01:37,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 31: [2023-04-27 00:01:37,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,394] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,396] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,398] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 25: [2023-04-27 00:01:37,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 7: [2023-04-27 00:01:37,415] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,426] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,427] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,428] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 7: [2023-04-27 00:01:37,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,447] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,449] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,450] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 13: [2023-04-27 00:01:37,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 27: [2023-04-27 00:01:37,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 9: [2023-04-27 00:01:37,457] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,464] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 27: [2023-04-27 00:01:37,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,476] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 13: [2023-04-27 00:01:37,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 1: [2023-04-27 00:01:37,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,480] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,488] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 1: [2023-04-27 00:01:37,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 0: [2023-04-27 00:01:37,493] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 1: [2023-04-27 00:01:37,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,501] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,503] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,504] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 6: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 14: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 23: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,506] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,507] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 2: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 8: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,508] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 20: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 5: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,509] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 15: [2023-04-27 00:01:37,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,510] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,510] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt... 12: [2023-04-27 00:01:37,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,514] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,517] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,518] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,521] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 15: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,530] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,531] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 23: [2023-04-27 00:01:37,532] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 0: [2023-04-27 00:01:37,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 6: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 14: [2023-04-27 00:01:37,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,538] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 15: [2023-04-27 00:01:37,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,540] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,541] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 5: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 4: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 26: [2023-04-27 00:01:37,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 12: [2023-04-27 00:01:37,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 20: [2023-04-27 00:01:37,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 2: [2023-04-27 00:01:37,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 8: [2023-04-27 00:01:37,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 17: [2023-04-27 00:01:37,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 4: [2023-04-27 00:01:37,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_21-model_00-model_states.pt. 21: [2023-04-27 00:01:37,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,570] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,580] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,580] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,584] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 30: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,585] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,586] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,587] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,588] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,589] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 11: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,596] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 29: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 24: [2023-04-27 00:01:37,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,601] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 11: [2023-04-27 00:01:37,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 24: [2023-04-27 00:01:37,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,604] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,606] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 21: [2023-04-27 00:01:37,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,607] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 29: [2023-04-27 00:01:37,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,608] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 30: [2023-04-27 00:01:37,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 21: [2023-04-27 00:01:37,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,615] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,615] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,617] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 3: [2023-04-27 00:01:37,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 28: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,626] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,626] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:37,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,631] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,630] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 28: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,632] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 19: [2023-04-27 00:01:37,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 9: [2023-04-27 00:01:37,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:37,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,637] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,638] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,639] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 9: [2023-04-27 00:01:37,641] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,646] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 3: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,647] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 18: [2023-04-27 00:01:37,648] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,649] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 10: [2023-04-27 00:01:37,650] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,651] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,653] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 10: [2023-04-27 00:01:37,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,655] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,660] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,660] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,661] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,663] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,663] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,664] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 18: [2023-04-27 00:01:37,666] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,667] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 25: [2023-04-27 00:01:37,680] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,683] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 19: [2023-04-27 00:01:37,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,685] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,688] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,696] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,698] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 25: [2023-04-27 00:01:37,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,701] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,711] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:37,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 2: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 23: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 5: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 26: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 7: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 12: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 8: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 12: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:37,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 0: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 22: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 20: [2023-04-27 00:01:37,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,726] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 13: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 17: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 14: [2023-04-27 00:01:37,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 6: [2023-04-27 00:01:37,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 27: [2023-04-27 00:01:37,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt... 31: [2023-04-27 00:01:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 7: [2023-04-27 00:01:37,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 13: [2023-04-27 00:01:37,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,735] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:37,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 31: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 22: [2023-04-27 00:01:37,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,744] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 0: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:37,753] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 23: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 2: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 15: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,754] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:37,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:37,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,759] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 27: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 5: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 26: [2023-04-27 00:01:37,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 20: [2023-04-27 00:01:37,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:37,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:37,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:37,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:37,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 17: [2023-04-27 00:01:37,765] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,765] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:37,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 1: [2023-04-27 00:01:37,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:37,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,768] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:37,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,769] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:37,770] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 6: [2023-04-27 00:01:37,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:37,771] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,773] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 8: [2023-04-27 00:01:37,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:37,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:37,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,775] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,775] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:37,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 14: [2023-04-27 00:01:37,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,776] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:37,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:37,778] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 4: [2023-04-27 00:01:37,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,782] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,783] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,783] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:37,784] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 16: [2023-04-27 00:01:37,784] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,786] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:37,787] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_22-model_00-model_states.pt. 4: [2023-04-27 00:01:37,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:37,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,789] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,792] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,792] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,793] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:37,794] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,796] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,797] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,798] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,799] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,800] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,801] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,805] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,808] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,809] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,811] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,812] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,813] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,814] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,816] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:37,820] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:37,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,822] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:37,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,823] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,826] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:37,829] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,832] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:37,833] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,835] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:37,838] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:37,841] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:37,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,844] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,852] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,854] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,855] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,860] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:37,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,867] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,869] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,881] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:37,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:37,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:37,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,890] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:37,891] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 1: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 12: [2023-04-27 00:01:37,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:37,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,896] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:37,900] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:37,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,908] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,909] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,910] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:37,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:37,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,912] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:37,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,919] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:37,919] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:37,920] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,921] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,922] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:37,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,922] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:37,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 1: [2023-04-27 00:01:37,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:37,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,927] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:37,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,928] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:37,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:37,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:37,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:37,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:37,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:37,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:37,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 16: [2023-04-27 00:01:37,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:37,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:37,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:37,935] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:37,936] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:37,936] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:37,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:37,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,942] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,943] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:37,943] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,945] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 3: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,946] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,947] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 16: [2023-04-27 00:01:37,948] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:37,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,949] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,951] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,952] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:37,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:37,955] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 4: [2023-04-27 00:01:37,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:37,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:37,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:37,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,958] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:37,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,960] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:37,960] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:37,961] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:37,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:37,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:37,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,965] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,966] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:37,967] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,967] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:37,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,968] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,968] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,969] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,971] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:37,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,972] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:37,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:37,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:37,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 25: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,977] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 18: [2023-04-27 00:01:37,978] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:37,979] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,981] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:37,982] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 31: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:37,983] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 27: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 24: [2023-04-27 00:01:37,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:37,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:37,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 15: [2023-04-27 00:01:37,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:37,987] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 18: [2023-04-27 00:01:37,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,988] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:37,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:37,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 30: [2023-04-27 00:01:37,989] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:37,989] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:37,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:37,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:37,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 28: [2023-04-27 00:01:37,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 30: [2023-04-27 00:01:37,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:37,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:37,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:37,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:37,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:37,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:37,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:37,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:37,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:37,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 15: [2023-04-27 00:01:37,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:37,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:37,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,995] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:37,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:37,997] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:37,997] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:37,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:37,998] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:37,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:37,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 24: [2023-04-27 00:01:38,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,000] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,000] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:38,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:38,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,001] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:38,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:38,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:38,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:38,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 6: [2023-04-27 00:01:38,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 21: [2023-04-27 00:01:38,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 24: [2023-04-27 00:01:38,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 19: [2023-04-27 00:01:38,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:38,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,006] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,007] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:38,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:38,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:38,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:38,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:38,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:38,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:38,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:38,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:38,012] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:38,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 19: [2023-04-27 00:01:38,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,014] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 25: [2023-04-27 00:01:38,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:38,016] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:38,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:38,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,021] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,022] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,023] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,023] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 10: [2023-04-27 00:01:38,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 27: [2023-04-27 00:01:38,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 10: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 21: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 7: [2023-04-27 00:01:38,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,028] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,028] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 7: [2023-04-27 00:01:38,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:38,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 10: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:38,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:38,031] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:38,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:38,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 20: [2023-04-27 00:01:38,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 22: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 13: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,035] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,038] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,040] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,041] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 6: [2023-04-27 00:01:38,041] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,042] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 13: [2023-04-27 00:01:38,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:38,045] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:38,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,048] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 9: [2023-04-27 00:01:38,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:38,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,053] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:38,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,055] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,056] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:38,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:38,057] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,057] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:38,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:38,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 0: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 17: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,063] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,064] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 11: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:38,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 26: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 5: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,066] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 5: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 2: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 14: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 29: [2023-04-27 00:01:38,068] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 0: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 23: [2023-04-27 00:01:38,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 9: [2023-04-27 00:01:38,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:38,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 2: [2023-04-27 00:01:38,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 29: [2023-04-27 00:01:38,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 8: [2023-04-27 00:01:38,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt... 11: [2023-04-27 00:01:38,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 29: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 20: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 22: [2023-04-27 00:01:38,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 12: [2023-04-27 00:01:38,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:38,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 23: [2023-04-27 00:01:38,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 17: [2023-04-27 00:01:38,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 14: [2023-04-27 00:01:38,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 26: [2023-04-27 00:01:38,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:38,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 28: [2023-04-27 00:01:38,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 31: [2023-04-27 00:01:38,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:38,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,116] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 8: [2023-04-27 00:01:38,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 3: [2023-04-27 00:01:38,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:38,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_23-model_00-model_states.pt. 31: [2023-04-27 00:01:38,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,133] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:38,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 4: [2023-04-27 00:01:38,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 28: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 16: [2023-04-27 00:01:38,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,146] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,151] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:38,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:38,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,168] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,169] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,170] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,171] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:38,172] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:38,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 11: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,190] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,191] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,193] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,193] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,195] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 30: [2023-04-27 00:01:38,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 30: [2023-04-27 00:01:38,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,202] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,203] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,203] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,204] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,204] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,207] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,206] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,208] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,210] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,211] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,211] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,212] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,213] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,213] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,214] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,215] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,216] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,217] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,217] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,218] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,218] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 15: [2023-04-27 00:01:38,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,220] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,221] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,222] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,222] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,223] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,225] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 22: [2023-04-27 00:01:38,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,226] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,226] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 15: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,227] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,228] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,228] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 25: [2023-04-27 00:01:38,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 22: [2023-04-27 00:01:38,230] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,230] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,232] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,233] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 15: [2023-04-27 00:01:38,233] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,234] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,234] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,235] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 11: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,235] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,236] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,237] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,238] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 10: [2023-04-27 00:01:38,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,242] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,243] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,244] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,245] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 3: [2023-04-27 00:01:38,247] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,249] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,250] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,253] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,254] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,255] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,257] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,258] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,258] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,259] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,261] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,262] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,262] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,263] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,264] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,265] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,266] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,267] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,267] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 31: [2023-04-27 00:01:38,268] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,271] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,271] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,273] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 3: [2023-04-27 00:01:38,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 31: [2023-04-27 00:01:38,274] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:38,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,276] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 3: [2023-04-27 00:01:38,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 21: [2023-04-27 00:01:38,277] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,278] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,279] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,280] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,281] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,282] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 21: [2023-04-27 00:01:38,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 9: [2023-04-27 00:01:38,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,283] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 31: [2023-04-27 00:01:38,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,284] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,285] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,286] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 27: [2023-04-27 00:01:38,287] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 28: [2023-04-27 00:01:38,288] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 28: [2023-04-27 00:01:38,289] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,290] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,291] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,292] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,293] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,294] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,296] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 7: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,297] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 9: [2023-04-27 00:01:38,298] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 0: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 7: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 9: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:38,302] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,302] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 13: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,303] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 20: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 21: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,305] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 2: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 12: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 9: [2023-04-27 00:01:38,306] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 26: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 6: [2023-04-27 00:01:38,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,308] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,308] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:38,311] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 14: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 12: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 23: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 1: [2023-04-27 00:01:38,312] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,315] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,315] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,316] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 12: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,316] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 12: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 12: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,317] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,318] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,319] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,319] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,320] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,320] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 23: [2023-04-27 00:01:38,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,321] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,324] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,325] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,326] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,327] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,329] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,330] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,331] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,332] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,333] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 20: [2023-04-27 00:01:38,334] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,335] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 4: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,337] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,338] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,338] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 19: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 0: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,339] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,340] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 16: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 4: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,341] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 5: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 18: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,344] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 13: [2023-04-27 00:01:38,344] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 25: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 17: [2023-04-27 00:01:38,345] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 8: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 26: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 6: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 1: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,346] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 2: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 8: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,347] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt... 19: [2023-04-27 00:01:38,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 1: [2023-04-27 00:01:38,348] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,349] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 18: [2023-04-27 00:01:38,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,350] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,351] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 5: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,352] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,353] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 4: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,354] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,355] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,356] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,356] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,358] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,359] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 27: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 30: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,361] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 4: [2023-04-27 00:01:38,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,362] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 16: [2023-04-27 00:01:38,364] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,367] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 24: [2023-04-27 00:01:38,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,368] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,369] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 14: [2023-04-27 00:01:38,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 16: [2023-04-27 00:01:38,372] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 18: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,375] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,376] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,377] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 30: [2023-04-27 00:01:38,378] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,379] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,382] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 10: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,381] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 17: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,383] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,384] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,385] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 30: [2023-04-27 00:01:38,388] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,389] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,391] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,392] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,393] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,395] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,396] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,397] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,398] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,399] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,400] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,401] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,403] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,403] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 24: [2023-04-27 00:01:38,404] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,405] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,406] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,407] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,408] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,410] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,411] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,413] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_24-model_00-model_states.pt. 29: [2023-04-27 00:01:38,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,414] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,415] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,416] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,418] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,419] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,420] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 29: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,423] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,424] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,425] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 11: [2023-04-27 00:01:38,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,428] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,429] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,431] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,433] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,434] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,435] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,437] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 10: [2023-04-27 00:01:38,438] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,439] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 21: [2023-04-27 00:01:38,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 22: [2023-04-27 00:01:38,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,440] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,440] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,441] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 11: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,442] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,443] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,444] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,445] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,446] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,447] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,448] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,448] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,449] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,451] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,452] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,453] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,453] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,454] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,455] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,456] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,457] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 22: [2023-04-27 00:01:38,458] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,459] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,460] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 15: [2023-04-27 00:01:38,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,463] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,463] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,464] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 3: [2023-04-27 00:01:38,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,465] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,466] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 3: [2023-04-27 00:01:38,467] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,467] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,468] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,469] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,469] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,470] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,470] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,471] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,472] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 18: [2023-04-27 00:01:38,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,472] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,473] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,474] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,474] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 28: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,475] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,477] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,478] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,479] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,480] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,481] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,481] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,482] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,482] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,483] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,483] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,484] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,485] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,486] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,486] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,487] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,487] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,488] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,489] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,489] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,490] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,490] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,491] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,491] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,492] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,494] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 25: [2023-04-27 00:01:38,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,495] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,495] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,498] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,499] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,500] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,502] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,503] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,505] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,505] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,506] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,511] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,511] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,512] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,513] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,513] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,515] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,515] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,516] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,517] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,518] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,519] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,520] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,520] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,521] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,522] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,523] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 8: [2023-04-27 00:01:38,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,524] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,524] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 7: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,525] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,526] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,526] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,527] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,527] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 27: [2023-04-27 00:01:38,528] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 25: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,529] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,530] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 7: [2023-04-27 00:01:38,531] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,532] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,533] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,533] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,534] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,534] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,535] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,536] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,537] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,537] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,538] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 25: [2023-04-27 00:01:38,539] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,539] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,540] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,541] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 27: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,542] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,543] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,544] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,544] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,545] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,546] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,546] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,547] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 19: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,548] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 23: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,549] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,550] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 0: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 6: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,551] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 20: [2023-04-27 00:01:38,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,552] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 2: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,553] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 13: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 5: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 5: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,554] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 6: [2023-04-27 00:01:38,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,555] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,555] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 23: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,556] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 23: [2023-04-27 00:01:38,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,557] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,558] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,558] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 26: [2023-04-27 00:01:38,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 17: [2023-04-27 00:01:38,559] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 2: [2023-04-27 00:01:38,560] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 20: [2023-04-27 00:01:38,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,560] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 13: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,561] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt... 14: [2023-04-27 00:01:38,561] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,562] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,563] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,563] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,564] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 8: [2023-04-27 00:01:38,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,564] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,565] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,565] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,566] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,566] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,567] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,568] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,569] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,569] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,571] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,571] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 19: [2023-04-27 00:01:38,572] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,572] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,573] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,573] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,574] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,575] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,575] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 0: [2023-04-27 00:01:38,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,576] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,577] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 26: [2023-04-27 00:01:38,579] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,579] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,581] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,582] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,583] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 16: [2023-04-27 00:01:38,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,582] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,583] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,585] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 16: [2023-04-27 00:01:38,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,587] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,589] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,588] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 14: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 0: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,590] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 1: [2023-04-27 00:01:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,591] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,591] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,592] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,592] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,593] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 12: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,594] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,595] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 24: [2023-04-27 00:01:38,596] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,597] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,598] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,599] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,599] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,600] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,601] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,602] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 17: [2023-04-27 00:01:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,602] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,603] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,604] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,603] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,605] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,606] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,607] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,608] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,610] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,610] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 16: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,611] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,612] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 30: [2023-04-27 00:01:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,613] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 29: [2023-04-27 00:01:38,613] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 24: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,614] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 29: [2023-04-27 00:01:38,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,616] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,618] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,618] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_25-model_00-model_states.pt. 1: [2023-04-27 00:01:38,619] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,619] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,620] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,620] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 29: [2023-04-27 00:01:38,621] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,621] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,622] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,622] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,623] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 30: [2023-04-27 00:01:38,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,624] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,624] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 12: [2023-04-27 00:01:38,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,625] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,627] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,627] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,628] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,628] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 11: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 24: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,629] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 30: [2023-04-27 00:01:38,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,630] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,631] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,633] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,634] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 29: [2023-04-27 00:01:38,633] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,634] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 12: [2023-04-27 00:01:38,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,635] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,635] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,636] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,636] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,637] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,638] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:38,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:38,639] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,640] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 29: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,642] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,643] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,643] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,644] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 1: [2023-04-27 00:01:38,644] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:38,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,645] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,647] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 11: [2023-04-27 00:01:38,649] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,650] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,652] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,654] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,656] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 11: [2023-04-27 00:01:38,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,657] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,657] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 11: [2023-04-27 00:01:38,658] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,659] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,662] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,664] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:38,665] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,666] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,668] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,669] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,670] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,671] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,671] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,672] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,673] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,673] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,674] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 8: [2023-04-27 00:01:38,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,674] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,675] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:38,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,676] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 15: [2023-04-27 00:01:38,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:38,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:38,676] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,677] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:38,678] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,678] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,681] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,682] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,682] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,683] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,684] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,684] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,686] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 31: [2023-04-27 00:01:38,687] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,688] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 15: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,689] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 15: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,690] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,691] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,692] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,693] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 31: [2023-04-27 00:01:38,694] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 2: [2023-04-27 00:01:38,695] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,696] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 10: [2023-04-27 00:01:38,697] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,697] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,698] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 15: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,699] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,700] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,701] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,702] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 13: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 18: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 31: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,703] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 20: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 31: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,704] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 26: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 6: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,705] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,706] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,707] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,708] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,709] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 21: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,710] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,711] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 21: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 22: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 22: [2023-04-27 00:01:38,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,712] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 22: [2023-04-27 00:01:38,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,713] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 2: [2023-04-27 00:01:38,713] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 6: [2023-04-27 00:01:38,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,714] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 21: [2023-04-27 00:01:38,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:38,714] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,715] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,715] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 22: [2023-04-27 00:01:38,716] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,717] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 7: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,718] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,719] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,719] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,720] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,720] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,721] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 21: [2023-04-27 00:01:38,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,721] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 5: [2023-04-27 00:01:38,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,722] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,722] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,724] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,723] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,724] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 18: [2023-04-27 00:01:38,725] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,725] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,726] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,727] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,728] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 7: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,729] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,731] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,730] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 9: [2023-04-27 00:01:38,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,731] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,732] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 18: [2023-04-27 00:01:38,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:38,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,733] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,733] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,734] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,734] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:38,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,735] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,736] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 28: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 28: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 10: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,737] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 13: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 18: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 10: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,738] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 28: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,739] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 10: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,740] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 23: [2023-04-27 00:01:38,741] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 5: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,742] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,743] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,743] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 4: [2023-04-27 00:01:38,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 26: [2023-04-27 00:01:38,744] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 27: [2023-04-27 00:01:38,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 25: [2023-04-27 00:01:38,745] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,746] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,746] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:38,747] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,747] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 4: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 25: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 7: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:38,748] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 19: [2023-04-27 00:01:38,749] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 4: [2023-04-27 00:01:38,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,749] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,750] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:38,750] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 3: [2023-04-27 00:01:38,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,751] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:38,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:38,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,752] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 7: [2023-04-27 00:01:38,752] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:38,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 25: [2023-04-27 00:01:38,753] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 27: [2023-04-27 00:01:38,754] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 9: [2023-04-27 00:01:38,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:38,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:38,755] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,755] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 27: [2023-04-27 00:01:38,756] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:38,757] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,757] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,758] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,758] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 9: [2023-04-27 00:01:38,759] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:38,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,760] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,760] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 3: [2023-04-27 00:01:38,761] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,761] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,762] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 3: [2023-04-27 00:01:38,763] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,763] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,764] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,764] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 14: [2023-04-27 00:01:38,766] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 17: [2023-04-27 00:01:38,766] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 17: [2023-04-27 00:01:38,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt... 19: [2023-04-27 00:01:38,767] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,767] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,768] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,769] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,770] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,771] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 23: [2023-04-27 00:01:38,772] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,774] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,774] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:38,777] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,776] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,778] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 0: [2023-04-27 00:01:38,780] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 0: [2023-04-27 00:01:38,780] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:38,781] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,785] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,787] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 0: [2023-04-27 00:01:38,788] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,789] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 23: [2023-04-27 00:01:38,790] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,799] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,801] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 14: [2023-04-27 00:01:38,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,802] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,802] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,803] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,803] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,804] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,806] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,810] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,809] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,814] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 8: [2023-04-27 00:01:38,815] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,816] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,818] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,818] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_26-model_00-model_states.pt. 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,819] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,823] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,824] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,825] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,826] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,827] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,828] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,828] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,829] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 6: [2023-04-27 00:01:38,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,830] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,831] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,835] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,837] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:38,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:38,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:38,839] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:38,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,841] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:38,843] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,843] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 6: [2023-04-27 00:01:38,845] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,846] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:38,847] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,847] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,850] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,849] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,850] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,851] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,853] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,856] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,857] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,858] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,860] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,859] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,861] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,861] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 20: [2023-04-27 00:01:38,834] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,842] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,842] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,845] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,852] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,853] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 20: [2023-04-27 00:01:38,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,856] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,858] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,862] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,862] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 5: [2023-04-27 00:01:38,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,863] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,864] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 8: [2023-04-27 00:01:38,864] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,865] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:38,865] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,869] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,872] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 5: [2023-04-27 00:01:38,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,873] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,874] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 8: [2023-04-27 00:01:38,874] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,877] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,877] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 19: [2023-04-27 00:01:38,876] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,878] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 2: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,879] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,880] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,880] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,882] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,883] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,884] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,884] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 14: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 13: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,885] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:38,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,886] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,887] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 2: [2023-04-27 00:01:38,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,888] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 17: [2023-04-27 00:01:38,888] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt... 26: [2023-04-27 00:01:38,889] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,890] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,892] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,892] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,893] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,894] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,895] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,895] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,897] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,898] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,900] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,901] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 13: [2023-04-27 00:01:38,901] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,902] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,904] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,905] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,905] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:38,906] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:38,863] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,907] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 26: [2023-04-27 00:01:38,907] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,908] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:38,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:38,910] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,911] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,911] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:38,912] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 14: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,913] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,914] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,915] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 19: [2023-04-27 00:01:38,916] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,923] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 16: [2023-04-27 00:01:38,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,923] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,924] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:38,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,925] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,925] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:38,926] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,927] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,928] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,929] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,930] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:38,930] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:38,931] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,931] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,932] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,933] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,933] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,934] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,935] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:38,937] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 17: [2023-04-27 00:01:38,937] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 16: [2023-04-27 00:01:38,938] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:38,939] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:38,939] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:38,940] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,941] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:38,942] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_27-model_00-model_states.pt. 1: [2023-04-27 00:01:38,944] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:38,946] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:38,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,950] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,951] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,953] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,953] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,954] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,956] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,956] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,957] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,958] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,959] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,959] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:38,961] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,962] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,962] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,963] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,963] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,964] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,964] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,970] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,972] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,973] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,973] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,974] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,975] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:38,976] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,977] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:38,979] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 1: [2023-04-27 00:01:38,980] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 1: [2023-04-27 00:01:38,980] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,984] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,984] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,985] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:38,986] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:38,987] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,988] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,990] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,991] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,992] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,992] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,993] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,994] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:38,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:38,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,995] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:38,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:38,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:38,996] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 31: [2023-04-27 00:01:38,996] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:38,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:38,999] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 31: [2023-04-27 00:01:39,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:39,002] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,002] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:39,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 31: [2023-04-27 00:01:39,003] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,003] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,004] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,004] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,005] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:39,005] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,006] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:39,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,007] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,008] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 30: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 11: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,009] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 30: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,010] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,011] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:39,012] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:39,013] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:39,013] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,014] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:39,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:39,015] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,015] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,016] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,017] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,018] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,019] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,019] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,020] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,021] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,022] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 30: [2023-04-27 00:01:39,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,024] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,024] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,025] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,025] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,026] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,026] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 24: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 24: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 11: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,027] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,029] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,029] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:39,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 18: [2023-04-27 00:01:39,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,030] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,030] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,032] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,032] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,033] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:39,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-27 00:01:39,033] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 16: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,034] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,036] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,037] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:39,038] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 18: [2023-04-27 00:01:39,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 18: [2023-04-27 00:01:39,039] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,039] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,042] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,043] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,043] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,044] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,045] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,046] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,046] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,047] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,050] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 15: [2023-04-27 00:01:39,052] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,051] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 15: [2023-04-27 00:01:39,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,053] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:39,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,054] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,054] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,055] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,056] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,058] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,058] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,059] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,060] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 15: [2023-04-27 00:01:39,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,061] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,061] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 7: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,062] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,065] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,066] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,067] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,067] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,068] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,069] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,070] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,071] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,072] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,073] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,074] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 12: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,075] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 4: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 4: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,076] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,077] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,078] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,079] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,080] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 5: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 8: [2023-04-27 00:01:39,081] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,082] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 27: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,083] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,084] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,085] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 21: [2023-04-27 00:01:39,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,086] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,086] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,087] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,087] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,088] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,089] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,090] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,091] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,092] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,092] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,093] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,094] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,095] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 25: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,096] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,097] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,098] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 3: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,099] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 23: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,100] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 21: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 21: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,101] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,102] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,103] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,104] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,105] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,106] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,107] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 12: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,108] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 29: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,109] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 28: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 0: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 9: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:39,110] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 22: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,111] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,112] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 23: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 23: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 25: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,113] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 12: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 3: [2023-04-27 00:01:39,114] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,115] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,115] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,116] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 18: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,117] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 29: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 29: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 29: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 27: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,118] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 27: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 25: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 12: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 25: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 28: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,119] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 28: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 5: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,120] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 9: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,121] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: [2023-04-27 00:01:39,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,122] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,122] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 7: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 3: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,124] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,123] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,124] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,125] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,125] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 0: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,126] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,127] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,127] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,128] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,128] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,129] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 0: > using checkpoint value 0.0002 for learning rate 2: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 0: > using checkpoint value 2e-05 for minimum learning rate 0: > using checkpoint value 1220703 for warmup iterations 0: > using checkpoint value 122070313 for total number of iterations 0: > using checkpoint value cosine for decay style 22: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 5: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 9: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 5: [2023-04-27 00:01:39,130] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 22: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 22: [2023-04-27 00:01:39,131] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,132] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 6: [2023-04-27 00:01:39,133] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 6: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,134] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,135] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,136] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 14: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,137] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 13: [2023-04-27 00:01:39,138] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,138] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 10: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 10: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 10: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,139] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,140] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 20: [2023-04-27 00:01:39,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,141] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,141] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,142] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,143] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,143] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 13: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,144] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 13: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,145] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:39,146] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:39,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,147] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,147] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 17: [2023-04-27 00:01:39,148] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 26: [2023-04-27 00:01:39,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,149] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 8: [2023-04-27 00:01:39,149] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 8: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 8: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,150] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,151] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-04-27 00:01:39,152] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 20: [2023-04-27 00:01:39,152] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,153] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,154] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,155] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,155] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,156] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 2: [2023-04-27 00:01:39,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 2: [2023-04-27 00:01:39,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,157] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 2: [2023-04-27 00:01:39,157] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,158] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,159] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,158] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt... 19: [2023-04-27 00:01:39,159] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 27: [2023-04-27 00:01:39,160] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,161] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 26: [2023-04-27 00:01:39,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,161] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 26: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 26: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,162] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,163] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,164] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,165] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,165] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,166] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,166] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,167] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,167] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,168] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,171] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 21: [2023-04-27 00:01:39,172] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 25: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,173] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,174] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,174] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,175] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 130 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,175] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,176] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,176] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,177] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,178] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,178] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2023-04-27 00:01:39,181] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 19: [2023-04-27 00:01:39,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,182] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,182] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:39,183] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 129 19: [2023-04-27 00:01:39,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,183] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,183] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 28: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-27 00:01:39,184] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,185] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,185] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 17: [2023-04-27 00:01:39,186] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,187] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 14: [2023-04-27 00:01:39,188] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-27 00:01:39,189] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 22: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-27 00:01:39,190] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 16: [2023-04-27 00:01:39,192] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 00:01:39,192] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 134 16: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-27 00:01:39,194] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 16: [2023-04-27 00:01:39,194] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 128 17: [2023-04-27 00:01:39,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,196] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,196] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,197] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,197] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,198] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 16: [2023-04-27 00:01:39,198] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 133 14: [2023-04-27 00:01:39,198] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 14: [2023-04-27 00:01:39,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 14: [2023-04-27 00:01:39,199] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,199] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 19: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,200] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 19: [2023-04-27 00:01:39,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,201] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,201] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,202] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-27 00:01:39,205] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-27 00:01:39,208] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,209] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 13: [2023-04-27 00:01:39,212] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_28-model_00-model_states.pt. 16: [2023-04-27 00:01:39,214] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 00:01:39,215] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 131 17: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-27 00:01:39,219] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 16: [2023-04-27 00:01:39,220] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 00:01:39,221] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 00:01:39,221] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 135 16: [2023-04-27 00:01:39,221] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 132 17: [2023-04-27 00:01:39,223] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 17: [2023-04-27 00:01:39,224] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt... 17: [2023-04-27 00:01:39,224] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/layer_30-model_00-model_states.pt. 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-04-27 00:01:39,229] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,238] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-27 00:01:39,239] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,240] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,241] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,241] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 11 26: [2023-04-27 00:01:39,242] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-27 00:01:39,244] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-27 00:01:39,245] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 10 1: [2023-04-27 00:01:39,245] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 8 17: [2023-04-27 00:01:39,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-27 00:01:39,248] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,249] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-27 00:01:39,249] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 12 1: [2023-04-27 00:01:39,250] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 9 11: [2023-04-27 00:01:39,250] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 00:01:39,251] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 88 30: [2023-04-27 00:01:39,251] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,251] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 243 11: [2023-04-27 00:01:39,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 31: [2023-04-27 00:01:39,252] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 255 11: [2023-04-27 00:01:39,252] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,252] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 92 11: [2023-04-27 00:01:39,252] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 89 14: [2023-04-27 00:01:39,252] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-27 00:01:39,253] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,255] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,256] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 1: [2023-04-27 00:01:39,256] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-27 00:01:39,256] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 14 1: [2023-04-27 00:01:39,257] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 15 30: [2023-04-27 00:01:39,260] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,260] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 241 11: [2023-04-27 00:01:39,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,261] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,262] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 253 11: [2023-04-27 00:01:39,262] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 95 31: [2023-04-27 00:01:39,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,264] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 254 30: [2023-04-27 00:01:39,264] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,264] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 242 1: [2023-04-27 00:01:39,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,266] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,266] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 194 1: [2023-04-27 00:01:39,267] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 13 18: [2023-04-27 00:01:39,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,268] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,268] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 144 27: [2023-04-27 00:01:39,268] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 219 24: [2023-04-27 00:01:39,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,269] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 252 24: [2023-04-27 00:01:39,269] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 193 11: [2023-04-27 00:01:39,269] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 00:01:39,269] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 94 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 30: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,270] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 245 11: [2023-04-27 00:01:39,270] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 00:01:39,271] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 91 24: [2023-04-27 00:01:39,272] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,272] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 197 19: [2023-04-27 00:01:39,272] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 11: [2023-04-27 00:01:39,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,273] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,274] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 218 11: [2023-04-27 00:01:39,274] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 93 24: [2023-04-27 00:01:39,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,274] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,275] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 192 24: [2023-04-27 00:01:39,275] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 196 18: [2023-04-27 00:01:39,275] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,276] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 148 31: [2023-04-27 00:01:39,276] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,277] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 248 30: [2023-04-27 00:01:39,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,278] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 240 31: [2023-04-27 00:01:39,278] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,278] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 249 24: [2023-04-27 00:01:39,284] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,285] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 195 30: [2023-04-27 00:01:39,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,285] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 246 24: [2023-04-27 00:01:39,285] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,286] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 199 11: [2023-04-27 00:01:39,286] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 00:01:39,287] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 90 18: [2023-04-27 00:01:39,288] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,289] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 149 30: [2023-04-27 00:01:39,289] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 00:01:39,289] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 247 18: [2023-04-27 00:01:39,290] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,291] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 150 18: [2023-04-27 00:01:39,291] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,292] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,292] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 147 18: [2023-04-27 00:01:39,292] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 151 31: [2023-04-27 00:01:39,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,293] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 251 15: [2023-04-27 00:01:39,293] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,294] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 120 24: [2023-04-27 00:01:39,299] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 00:01:39,300] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 198 10: [2023-04-27 00:01:39,300] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,301] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,301] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 83 15: [2023-04-27 00:01:39,301] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 127 18: [2023-04-27 00:01:39,304] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,305] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 146 15: [2023-04-27 00:01:39,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,305] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 125 30: [2023-04-27 00:01:39,305] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,305] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 121 30: [2023-04-27 00:01:39,306] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 244 31: [2023-04-27 00:01:39,306] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 00:01:39,306] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 250 15: [2023-04-27 00:01:39,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,307] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 126 4: [2023-04-27 00:01:39,282] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,283] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 34 4: [2023-04-27 00:01:39,283] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,284] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 37 4: [2023-04-27 00:01:39,307] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,308] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 38 18: [2023-04-27 00:01:39,309] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 00:01:39,309] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 145 15: [2023-04-27 00:01:39,310] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,311] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 122 13: [2023-04-27 00:01:39,311] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,311] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 108 15: [2023-04-27 00:01:39,312] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,312] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 124 15: [2023-04-27 00:01:39,313] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 00:01:39,314] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 123 10: [2023-04-27 00:01:39,322] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,322] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 84 10: [2023-04-27 00:01:39,323] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,324] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 85 0: [2023-04-27 00:01:39,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,329] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 1 4: [2023-04-27 00:01:39,328] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,329] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 32 27: [2023-04-27 00:01:39,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,331] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 223 27: [2023-04-27 00:01:39,331] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 216 9: [2023-04-27 00:01:39,331] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 00:01:39,332] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 75 27: [2023-04-27 00:01:39,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,333] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 222 25: [2023-04-27 00:01:39,336] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,336] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 206 9: [2023-04-27 00:01:39,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,337] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 203 9: [2023-04-27 00:01:39,337] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 77 27: [2023-04-27 00:01:39,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,339] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 217 8: [2023-04-27 00:01:39,339] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 69 9: [2023-04-27 00:01:39,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,339] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 67 8: [2023-04-27 00:01:39,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 9: [2023-04-27 00:01:39,340] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 79 8: [2023-04-27 00:01:39,340] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 71 4: [2023-04-27 00:01:39,332] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,333] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 33 5: [2023-04-27 00:01:39,335] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,337] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,333] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,334] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 35 4: [2023-04-27 00:01:39,334] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 00:01:39,335] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 39 4: [2023-04-27 00:01:39,340] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,335] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 45 6: [2023-04-27 00:01:39,338] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 48 4: [2023-04-27 00:01:39,341] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 36 5: [2023-04-27 00:01:39,339] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,339] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 41 9: [2023-04-27 00:01:39,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,342] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,343] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 201 9: [2023-04-27 00:01:39,343] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 74 10: [2023-04-27 00:01:39,343] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 86 25: [2023-04-27 00:01:39,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 27: [2023-04-27 00:01:39,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 9: [2023-04-27 00:01:39,343] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,343] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 43 27: [2023-04-27 00:01:39,344] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 221 25: [2023-04-27 00:01:39,344] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 200 9: [2023-04-27 00:01:39,344] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 72 23: [2023-04-27 00:01:39,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,345] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 184 8: [2023-04-27 00:01:39,345] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,346] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 64 8: [2023-04-27 00:01:39,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,348] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 173 21: [2023-04-27 00:01:39,348] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 168 21: [2023-04-27 00:01:39,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 174 21: [2023-04-27 00:01:39,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 171 8: [2023-04-27 00:01:39,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 70 22: [2023-04-27 00:01:39,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,349] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 179 6: [2023-04-27 00:01:39,349] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,350] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 53 22: [2023-04-27 00:01:39,350] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,351] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,351] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 177 6: [2023-04-27 00:01:39,351] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 55 25: [2023-04-27 00:01:39,353] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,354] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 205 28: [2023-04-27 00:01:39,354] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,354] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 227 25: [2023-04-27 00:01:39,355] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,356] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 204 25: [2023-04-27 00:01:39,357] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,357] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 207 8: [2023-04-27 00:01:39,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 9: [2023-04-27 00:01:39,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,359] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 9: [2023-04-27 00:01:39,360] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 78 8: [2023-04-27 00:01:39,360] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 65 9: [2023-04-27 00:01:39,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,360] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 6 10: [2023-04-27 00:01:39,360] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 9: [2023-04-27 00:01:39,361] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 76 10: [2023-04-27 00:01:39,361] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 80 12: [2023-04-27 00:01:39,361] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,362] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 98 23: [2023-04-27 00:01:39,362] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,362] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 187 10: [2023-04-27 00:01:39,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,363] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,364] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 103 10: [2023-04-27 00:01:39,364] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 82 0: [2023-04-27 00:01:39,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,364] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,364] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 81 0: [2023-04-27 00:01:39,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 7 22: [2023-04-27 00:01:39,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 178 23: [2023-04-27 00:01:39,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 188 6: [2023-04-27 00:01:39,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,365] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,365] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 51 28: [2023-04-27 00:01:39,366] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 225 5: [2023-04-27 00:01:39,366] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,366] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 46 5: [2023-04-27 00:01:39,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,367] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,367] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 175 5: [2023-04-27 00:01:39,367] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 47 12: [2023-04-27 00:01:39,367] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 100 28: [2023-04-27 00:01:39,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 224 21: [2023-04-27 00:01:39,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,368] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 169 21: [2023-04-27 00:01:39,368] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,369] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 172 17: [2023-04-27 00:01:39,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 21: [2023-04-27 00:01:39,369] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 170 28: [2023-04-27 00:01:39,369] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 229 5: [2023-04-27 00:01:39,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,369] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 138 27: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,369] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 232 27: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 220 23: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 66 5: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 44 29: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 189 29: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 237 28: [2023-04-27 00:01:39,370] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 226 12: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,370] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 228 0: [2023-04-27 00:01:39,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 102 23: [2023-04-27 00:01:39,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 0 0: [2023-04-27 00:01:39,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 2 23: [2023-04-27 00:01:39,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 7: [2023-04-27 00:01:39,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,371] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 191 7: [2023-04-27 00:01:39,371] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 185 7: [2023-04-27 00:01:39,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 60 7: [2023-04-27 00:01:39,372] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 62 10: [2023-04-27 00:01:39,372] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 10: [2023-04-27 00:01:39,373] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 87 12: [2023-04-27 00:01:39,373] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 99 0: [2023-04-27 00:01:39,373] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,374] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 4 29: [2023-04-27 00:01:39,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 233 29: [2023-04-27 00:01:39,375] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 239 23: [2023-04-27 00:01:39,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,375] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 23: [2023-04-27 00:01:39,376] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 190 22: [2023-04-27 00:01:39,376] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 181 22: [2023-04-27 00:01:39,376] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 182 22: [2023-04-27 00:01:39,376] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 183 6: [2023-04-27 00:01:39,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-04-27 00:01:39,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-27 00:01:39,376] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,377] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 52 7: [2023-04-27 00:01:39,377] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 61 7: [2023-04-27 00:01:39,377] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 59 13: [2023-04-27 00:01:39,377] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,378] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 110 12: [2023-04-27 00:01:39,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,378] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 25: [2023-04-27 00:01:39,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 202 22: [2023-04-27 00:01:39,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 180 20: [2023-04-27 00:01:39,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 101 23: [2023-04-27 00:01:39,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 166 20: [2023-04-27 00:01:39,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 165 23: [2023-04-27 00:01:39,379] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 186 7: [2023-04-27 00:01:39,379] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 00:01:39,380] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 63 0: [2023-04-27 00:01:39,380] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 00:01:39,381] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 3 7: [2023-04-27 00:01:39,381] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 22: [2023-04-27 00:01:39,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 7: [2023-04-27 00:01:39,382] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 56 22: [2023-04-27 00:01:39,382] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 176 6: [2023-04-27 00:01:39,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,382] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,383] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 50 12: [2023-04-27 00:01:39,383] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 5: [2023-04-27 00:01:39,383] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 40 12: [2023-04-27 00:01:39,383] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 97 20: [2023-04-27 00:01:39,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,384] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,384] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 236 28: [2023-04-27 00:01:39,384] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 230 20: [2023-04-27 00:01:39,384] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 163 3: [2023-04-27 00:01:39,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,385] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 161 3: [2023-04-27 00:01:39,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 27 20: [2023-04-27 00:01:39,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 164 17: [2023-04-27 00:01:39,386] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 141 7: [2023-04-27 00:01:39,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,386] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 167 7: [2023-04-27 00:01:39,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 58 13: [2023-04-27 00:01:39,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 109 13: [2023-04-27 00:01:39,387] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,387] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 238 9: [2023-04-27 00:01:39,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,388] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 105 28: [2023-04-27 00:01:39,388] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-27 00:01:39,388] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 231 9: [2023-04-27 00:01:39,388] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 73 29: [2023-04-27 00:01:39,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,389] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,389] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 235 2: [2023-04-27 00:01:39,389] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 16 29: [2023-04-27 00:01:39,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 00:01:39,390] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 234 3: [2023-04-27 00:01:39,390] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-04-27 00:01:39,391] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-04-27 00:01:39,391] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 29 7: [2023-04-27 00:01:39,391] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 57 3: [2023-04-27 00:01:39,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,392] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 212 3: [2023-04-27 00:01:39,392] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 30 26: [2023-04-27 00:01:39,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,393] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 214 3: [2023-04-27 00:01:39,393] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 00:01:39,394] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 26 13: [2023-04-27 00:01:39,394] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,395] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 107 8: [2023-04-27 00:01:39,395] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 00:01:39,396] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 68 0: [2023-04-27 00:01:39,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,397] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,398] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 54 0: [2023-04-27 00:01:39,398] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 5 26: [2023-04-27 00:01:39,399] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,399] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 209 13: [2023-04-27 00:01:39,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 3: [2023-04-27 00:01:39,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 00:01:39,400] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,400] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 111 3: [2023-04-27 00:01:39,401] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 28 3: [2023-04-27 00:01:39,401] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 24 14: [2023-04-27 00:01:39,401] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,401] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 113 6: [2023-04-27 00:01:39,402] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 00:01:39,403] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 49 20: [2023-04-27 00:01:39,406] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,407] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 160 17: [2023-04-27 00:01:39,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,407] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,408] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 137 17: [2023-04-27 00:01:39,408] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 139 13: [2023-04-27 00:01:39,409] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,409] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 106 14: [2023-04-27 00:01:39,410] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,411] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 115 20: [2023-04-27 00:01:39,411] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 00:01:39,412] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 162 12: [2023-04-27 00:01:39,412] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 00:01:39,412] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 96 14: [2023-04-27 00:01:39,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,414] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 117 14: [2023-04-27 00:01:39,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,414] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,415] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 114 14: [2023-04-27 00:01:39,415] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 118 2: [2023-04-27 00:01:39,416] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,416] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 17 13: [2023-04-27 00:01:39,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 00:01:39,417] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 104 17: [2023-04-27 00:01:39,417] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,417] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 140 2: [2023-04-27 00:01:39,418] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,418] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 23 26: [2023-04-27 00:01:39,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,419] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 208 17: [2023-04-27 00:01:39,419] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,420] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 143 5: [2023-04-27 00:01:39,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,421] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,421] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 211 5: [2023-04-27 00:01:39,421] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 42 2: [2023-04-27 00:01:39,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,422] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,422] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 21 2: [2023-04-27 00:01:39,423] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 18 2: [2023-04-27 00:01:39,423] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,424] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,424] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 19 2: [2023-04-27 00:01:39,424] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 20 3: [2023-04-27 00:01:39,426] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 00:01:39,427] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 31 14: [2023-04-27 00:01:39,427] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,427] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 119 26: [2023-04-27 00:01:39,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,429] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 213 26: [2023-04-27 00:01:39,429] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 215 19: [2023-04-27 00:01:39,430] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 156 19: [2023-04-27 00:01:39,430] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 155 19: [2023-04-27 00:01:39,429] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,430] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,430] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 158 17: [2023-04-27 00:01:39,430] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 142 19: [2023-04-27 00:01:39,431] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,431] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 157 17: [2023-04-27 00:01:39,432] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 00:01:39,432] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 136 26: [2023-04-27 00:01:39,436] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 00:01:39,436] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 210 14: [2023-04-27 00:01:39,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 3: [2023-04-27 00:01:39,439] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,439] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 116 3: [2023-04-27 00:01:39,439] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 25 19: [2023-04-27 00:01:39,441] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,442] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 153 19: [2023-04-27 00:01:39,443] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,444] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 154 16: [2023-04-27 00:01:39,444] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 132 19: [2023-04-27 00:01:39,444] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,445] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 159 19: [2023-04-27 00:01:39,445] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 00:01:39,446] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 152 2: [2023-04-27 00:01:39,455] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 00:01:39,456] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 22 14: [2023-04-27 00:01:39,461] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from checkpoints_1b1250b1b5/global_step140000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 00:01:39,462] [INFO] [engine.py:2844:_get_all_zero_checkpoint_state_dicts] successfully read 256 ZeRO state_dicts for rank 112 16: [2023-04-27 00:01:39,501] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 135 1: [2023-04-27 00:01:39,535] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 13 28: [2023-04-27 00:01:39,541] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 227 5: [2023-04-27 00:01:39,561] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 44 27: [2023-04-27 00:01:39,564] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 218 20: [2023-04-27 00:01:39,567] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 160 16: [2023-04-27 00:01:39,572] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 131 30: [2023-04-27 00:01:39,573] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 246 30: [2023-04-27 00:01:39,579] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 247 24: [2023-04-27 00:01:39,597] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 198 31: [2023-04-27 00:01:39,599] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 252 3: [2023-04-27 00:01:39,599] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 26 10: [2023-04-27 00:01:39,601] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 82 27: [2023-04-27 00:01:39,602] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 217 9: [2023-04-27 00:01:39,611] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 78 18: [2023-04-27 00:01:39,612] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 145 24: [2023-04-27 00:01:39,619] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 196 0: [2023-04-27 00:01:39,620] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 0 0: checkpoint version 3.0 1: [2023-04-27 00:01:39,624] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 9 11: [2023-04-27 00:01:39,625] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 88 31: [2023-04-27 00:01:39,633] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 254 17: [2023-04-27 00:01:39,634] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 140 15: [2023-04-27 00:01:39,636] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 122 18: [2023-04-27 00:01:39,637] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 146 22: [2023-04-27 00:01:39,637] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 176 11: [2023-04-27 00:01:39,638] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 92 31: [2023-04-27 00:01:39,639] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 249 13: [2023-04-27 00:01:39,639] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 110 7: [2023-04-27 00:01:39,645] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 57 4: [2023-04-27 00:01:39,646] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 32 16: [2023-04-27 00:01:39,650] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 128 11: [2023-04-27 00:01:39,651] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 91 10: [2023-04-27 00:01:39,652] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 87 1: [2023-04-27 00:01:39,652] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 14 3: [2023-04-27 00:01:39,653] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 30 4: [2023-04-27 00:01:39,654] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 39 2: [2023-04-27 00:01:39,656] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 23 29: [2023-04-27 00:01:39,657] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 237 1: [2023-04-27 00:01:39,657] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 8 31: [2023-04-27 00:01:39,659] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 253 16: [2023-04-27 00:01:39,659] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 130 15: [2023-04-27 00:01:39,659] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 127 11: [2023-04-27 00:01:39,662] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 90 30: [2023-04-27 00:01:39,663] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 241 16: [2023-04-27 00:01:39,663] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 133 12: [2023-04-27 00:01:39,663] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 101 10: [2023-04-27 00:01:39,665] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 80 31: [2023-04-27 00:01:39,666] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 255 16: [2023-04-27 00:01:39,666] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 129 9: [2023-04-27 00:01:39,669] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 75 16: [2023-04-27 00:01:39,670] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 134 13: [2023-04-27 00:01:39,670] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 111 12: [2023-04-27 00:01:39,671] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 97 6: [2023-04-27 00:01:39,673] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 51 21: [2023-04-27 00:01:39,673] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 170 1: [2023-04-27 00:01:39,674] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 12 18: [2023-04-27 00:01:39,661] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 144 1: [2023-04-27 00:01:39,677] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 11 24: [2023-04-27 00:01:39,679] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 194 11: [2023-04-27 00:01:39,680] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 89 27: [2023-04-27 00:01:39,681] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 220 4: [2023-04-27 00:01:39,683] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 34 30: [2023-04-27 00:01:39,685] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 240 21: [2023-04-27 00:01:39,685] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 169 25: [2023-04-27 00:01:39,685] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 206 15: [2023-04-27 00:01:39,686] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 121 24: [2023-04-27 00:01:39,690] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 193 11: [2023-04-27 00:01:39,691] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 94 31: [2023-04-27 00:01:39,692] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 251 1: [2023-04-27 00:01:39,692] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 15 18: [2023-04-27 00:01:39,693] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 149 29: [2023-04-27 00:01:39,694] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 234 17: [2023-04-27 00:01:39,694] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 138 7: [2023-04-27 00:01:39,695] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 59 30: [2023-04-27 00:01:39,696] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 243 30: [2023-04-27 00:01:39,696] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 245 2: [2023-04-27 00:01:39,697] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 16 15: [2023-04-27 00:01:39,697] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 126 3: [2023-04-27 00:01:39,701] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 27 23: [2023-04-27 00:01:39,700] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 184 17: [2023-04-27 00:01:39,702] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 141 5: [2023-04-27 00:01:39,702] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 46 12: [2023-04-27 00:01:39,706] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 102 8: [2023-04-27 00:01:39,707] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 66 13: [2023-04-27 00:01:39,707] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 107 31: [2023-04-27 00:01:39,708] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 248 22: [2023-04-27 00:01:39,708] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 183 23: [2023-04-27 00:01:39,708] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 188 28: [2023-04-27 00:01:39,709] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 230 21: [2023-04-27 00:01:39,709] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 175 18: [2023-04-27 00:01:39,709] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 148 1: [2023-04-27 00:01:39,710] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 10 26: [2023-04-27 00:01:39,712] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 209 24: [2023-04-27 00:01:39,713] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 195 25: [2023-04-27 00:01:39,713] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 204 0: [2023-04-27 00:01:39,714] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 6 28: [2023-04-27 00:01:39,714] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 231 27: [2023-04-27 00:01:39,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 219 6: [2023-04-27 00:01:39,717] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 50 15: [2023-04-27 00:01:39,718] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 123 13: [2023-04-27 00:01:39,718] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 104 27: [2023-04-27 00:01:39,718] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 216 22: [2023-04-27 00:01:39,719] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 177 0: [2023-04-27 00:01:39,719] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 1 6: [2023-04-27 00:01:39,720] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 49 25: [2023-04-27 00:01:39,720] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 207 9: [2023-04-27 00:01:39,720] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 77 18: [2023-04-27 00:01:39,721] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 151 30: [2023-04-27 00:01:39,721] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 242 18: [2023-04-27 00:01:39,722] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 150 30: [2023-04-27 00:01:39,722] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 244 4: [2023-04-27 00:01:39,724] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 35 4: [2023-04-27 00:01:39,724] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 36 11: [2023-04-27 00:01:39,724] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 93 5: [2023-04-27 00:01:39,725] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 45 11: [2023-04-27 00:01:39,725] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 95 24: [2023-04-27 00:01:39,726] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 199 9: [2023-04-27 00:01:39,727] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 72 31: [2023-04-27 00:01:39,727] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 250 24: [2023-04-27 00:01:39,727] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 192 6: [2023-04-27 00:01:39,727] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 54 13: [2023-04-27 00:01:39,731] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 108 4: [2023-04-27 00:01:39,731] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 38 8: [2023-04-27 00:01:39,733] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 65 8: [2023-04-27 00:01:39,735] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 68 3: [2023-04-27 00:01:39,735] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 25 24: [2023-04-27 00:01:39,737] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 197 15: [2023-04-27 00:01:39,739] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 124 13: [2023-04-27 00:01:39,740] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 106 14: [2023-04-27 00:01:39,741] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 115 21: [2023-04-27 00:01:39,743] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 168 28: [2023-04-27 00:01:39,744] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 224 9: [2023-04-27 00:01:39,745] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 74 27: [2023-04-27 00:01:39,746] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 221 14: [2023-04-27 00:01:39,747] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 117 27: [2023-04-27 00:01:39,747] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 223 0: [2023-04-27 00:01:39,749] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 4 23: [2023-04-27 00:01:39,750] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 187 15: [2023-04-27 00:01:39,751] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 125 19: [2023-04-27 00:01:39,751] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 152 22: [2023-04-27 00:01:39,753] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 179 18: [2023-04-27 00:01:39,754] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 147 10: [2023-04-27 00:01:39,755] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 83 26: [2023-04-27 00:01:39,756] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 212 10: [2023-04-27 00:01:39,756] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 81 20: [2023-04-27 00:01:39,756] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 164 25: [2023-04-27 00:01:39,757] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 201 22: [2023-04-27 00:01:39,758] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 180 26: [2023-04-27 00:01:39,758] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 211 7: [2023-04-27 00:01:39,758] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 63 29: [2023-04-27 00:01:39,759] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 232 14: [2023-04-27 00:01:39,759] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 119 9: [2023-04-27 00:01:39,764] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 73 8: [2023-04-27 00:01:39,765] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 64 25: [2023-04-27 00:01:39,765] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 203 14: [2023-04-27 00:01:39,766] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 118 0: [2023-04-27 00:01:39,767] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 7 23: [2023-04-27 00:01:39,767] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 190 9: [2023-04-27 00:01:39,769] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 79 23: [2023-04-27 00:01:39,769] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 185 19: [2023-04-27 00:01:39,771] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 157 0: [2023-04-27 00:01:39,772] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 2 12: [2023-04-27 00:01:39,772] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 98 28: [2023-04-27 00:01:39,772] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 228 5: [2023-04-27 00:01:39,774] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 42 8: [2023-04-27 00:01:39,774] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 67 5: [2023-04-27 00:01:39,774] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 40 12: [2023-04-27 00:01:39,776] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 100 4: [2023-04-27 00:01:39,777] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 33 28: [2023-04-27 00:01:39,778] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 226 17: [2023-04-27 00:01:39,781] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 139 15: [2023-04-27 00:01:39,782] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 120 21: [2023-04-27 00:01:39,782] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 174 10: [2023-04-27 00:01:39,782] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 86 20: [2023-04-27 00:01:39,783] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 167 25: [2023-04-27 00:01:39,785] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 202 7: [2023-04-27 00:01:39,785] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 61 19: [2023-04-27 00:01:39,785] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 158 29: [2023-04-27 00:01:39,785] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 238 2: [2023-04-27 00:01:39,787] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 19 4: [2023-04-27 00:01:39,787] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 37 25: [2023-04-27 00:01:39,788] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 200 5: [2023-04-27 00:01:39,790] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 43 26: [2023-04-27 00:01:39,790] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 215 17: [2023-04-27 00:01:39,790] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 136 22: [2023-04-27 00:01:39,790] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 181 20: [2023-04-27 00:01:39,791] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 163 8: [2023-04-27 00:01:39,792] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 70 6: [2023-04-27 00:01:39,792] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 48 17: [2023-04-27 00:01:39,794] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 143 8: [2023-04-27 00:01:39,795] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 69 26: [2023-04-27 00:01:39,796] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 214 3: [2023-04-27 00:01:39,796] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 31 25: [2023-04-27 00:01:39,797] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 205 6: [2023-04-27 00:01:39,797] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 53 10: [2023-04-27 00:01:39,798] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 84 28: [2023-04-27 00:01:39,799] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 225 29: [2023-04-27 00:01:39,801] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 236 28: [2023-04-27 00:01:39,802] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 229 19: [2023-04-27 00:01:39,802] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 153 0: [2023-04-27 00:01:39,803] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 3 7: [2023-04-27 00:01:39,803] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 60 5: [2023-04-27 00:01:39,804] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 41 0: [2023-04-27 00:01:39,806] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 5 23: [2023-04-27 00:01:39,806] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 189 23: [2023-04-27 00:01:39,806] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 186 20: [2023-04-27 00:01:39,806] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 166 14: [2023-04-27 00:01:39,809] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 113 21: [2023-04-27 00:01:39,811] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 171 29: [2023-04-27 00:01:39,811] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 239 10: [2023-04-27 00:01:39,811] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 85 7: [2023-04-27 00:01:39,811] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 62 2: [2023-04-27 00:01:39,811] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 17 3: [2023-04-27 00:01:39,812] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 24 22: [2023-04-27 00:01:39,812] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 182 21: [2023-04-27 00:01:39,816] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 173 26: [2023-04-27 00:01:39,816] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 210 7: [2023-04-27 00:01:39,818] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 58 29: [2023-04-27 00:01:39,820] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 235 23: [2023-04-27 00:01:39,821] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 191 5: [2023-04-27 00:01:39,821] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 47 29: [2023-04-27 00:01:39,821] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 233 21: [2023-04-27 00:01:39,822] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 172 12: [2023-04-27 00:01:39,823] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 103 12: [2023-04-27 00:01:39,823] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 96 8: [2023-04-27 00:01:39,823] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 71 27: [2023-04-27 00:01:39,825] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 222 22: [2023-04-27 00:01:39,826] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 178 12: [2023-04-27 00:01:39,829] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 99 20: [2023-04-27 00:01:39,831] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 162 6: [2023-04-27 00:01:39,832] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 52 13: [2023-04-27 00:01:39,836] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 105 2: [2023-04-27 00:01:39,838] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 18 26: [2023-04-27 00:01:39,840] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 213 14: [2023-04-27 00:01:39,840] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 114 3: [2023-04-27 00:01:39,842] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 29 3: [2023-04-27 00:01:39,842] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 28 14: [2023-04-27 00:01:39,842] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 116 2: [2023-04-27 00:01:39,843] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 21 20: [2023-04-27 00:01:39,844] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 161 6: [2023-04-27 00:01:39,846] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 55 7: [2023-04-27 00:01:39,847] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 56 20: [2023-04-27 00:01:39,847] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 165 13: [2023-04-27 00:01:39,848] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 109 17: [2023-04-27 00:01:39,852] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 142 19: [2023-04-27 00:01:39,853] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 155 26: [2023-04-27 00:01:39,854] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 208 9: [2023-04-27 00:01:39,854] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 76 2: [2023-04-27 00:01:39,856] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 20 14: [2023-04-27 00:01:39,857] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 112 2: [2023-04-27 00:01:39,872] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 22 19: [2023-04-27 00:01:39,873] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 156 17: [2023-04-27 00:01:39,877] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 137 19: [2023-04-27 00:01:39,898] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 159 19: [2023-04-27 00:01:39,900] [INFO] [engine.py:2784:_load_zero_checkpoint] loading 256 zero partition checkpoints for rank 154 0: successfully loaded checkpoint from checkpoints_1b1250b1b5 at iteration 140000 31: time (ms) | load-checkpoint: 8980.59 0: estimated model parameters: 1.096338432 0: estimated model parameters without embeddings: 1.002523648 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-04-27 00:01:40 0: > building train, validation, and test datasets ... 0: > datasets target sizes (minimum size): 0: train: 122070313 0: validation: 12288 0: test: 256 0: > building train, validation, and test datasets for GPT ... 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.008735 seconds 0: number of documents: 3133972 0: > dataset split: 0: train: 0: document indices in [0, 3133972) total of 3133972 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document_train_indexmap_122070313ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document_train_indexmap_122070313ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_1B5_text_document_train_indexmap_122070313ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.113 seconds 0: total number of samples: 122077260 0: total number of epochs: 167 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.095052 seconds 0: number of documents: 364608 0: > dataset split: 0: validation: 0: document indices in [0, 364608) total of 364608 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_12288ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_12288ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_12288ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.086 seconds 0: total number of samples: 84978 0: total number of epochs: 1 0: > finished creating GPT datasets ... 0: [after dataloaders are built] datetime: 2023-04-27 00:01:48 0: done with setup ... 0: training ... 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: 31: time (ms) | model-and-optimizer-setup: 29415.09 | train/valid/test-data-iterators-setup: 7852.13 0: [000-000] 1.0963B / 1.0025B 0: [before the start of training step] datetime: 2023-04-27 00:01:48 0: [Rank 0] (after 140100 iterations) memory (MB) | allocated: 8829.99951171875 | max allocated: 22735.109375 | reserved: 25882.0 | max reserved: 25882.0 31: iteration 140100/ 476837 | consumed samples: 35865600 | consumed tokens: 73452748800 | elapsed time per iteration (s): 0.81 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.622180E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 314.233 | TFLOPs: 19.01 | 31: iteration 140200/ 476837 | consumed samples: 35891200 | consumed tokens: 73505177600 | elapsed time per iteration (s): 0.68 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.629785E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.648 | TFLOPs: 22.73 | 31: iteration 140300/ 476837 | consumed samples: 35916800 | consumed tokens: 73557606400 | elapsed time per iteration (s): 0.68 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.628555E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.678 | TFLOPs: 22.73 | 31: iteration 140400/ 476837 | consumed samples: 35942400 | consumed tokens: 73610035200 | elapsed time per iteration (s): 0.68 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.623374E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.910 | TFLOPs: 22.74 | 31: iteration 140500/ 476837 | consumed samples: 35968000 | consumed tokens: 73662464000 | elapsed time per iteration (s): 0.68 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.621288E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.022 | TFLOPs: 22.75 | 31: iteration 140600/ 476837 | consumed samples: 35993600 | consumed tokens: 73714892800 | elapsed time per iteration (s): 0.68 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.620478E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.997 | TFLOPs: 22.75 | 31: iteration 140700/ 476837 | consumed samples: 36019200 | consumed tokens: 73767321600 | elapsed time per iteration (s): 0.68 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.626695E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.089 | TFLOPs: 22.75 | 31: iteration 140800/ 476837 | consumed samples: 36044800 | consumed tokens: 73819750400 | elapsed time per iteration (s): 0.68 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.623370E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.034 | TFLOPs: 22.75 | 31: iteration 140900/ 476837 | consumed samples: 36070400 | consumed tokens: 73872179200 | elapsed time per iteration (s): 0.68 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.622136E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.125 | TFLOPs: 22.75 | 31: iteration 141000/ 476837 | consumed samples: 36096000 | consumed tokens: 73924608000 | elapsed time per iteration (s): 0.68 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.617227E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.144 | TFLOPs: 22.76 | 31: iteration 141100/ 476837 | consumed samples: 36121600 | consumed tokens: 73977036800 | elapsed time per iteration (s): 0.68 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.624265E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.179 | TFLOPs: 22.76 | 31: iteration 141200/ 476837 | consumed samples: 36147200 | consumed tokens: 74029465600 | elapsed time per iteration (s): 0.68 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.626507E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.739 | TFLOPs: 22.61 | 31: iteration 141300/ 476837 | consumed samples: 36172800 | consumed tokens: 74081894400 | elapsed time per iteration (s): 0.68 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.619826E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.626 | TFLOPs: 22.72 | 31: iteration 141400/ 476837 | consumed samples: 36198400 | consumed tokens: 74134323200 | elapsed time per iteration (s): 0.68 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.624040E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.012 | TFLOPs: 22.75 | 31: iteration 141500/ 476837 | consumed samples: 36224000 | consumed tokens: 74186752000 | elapsed time per iteration (s): 0.68 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.623494E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.022 | TFLOPs: 22.75 | 31: iteration 141600/ 476837 | consumed samples: 36249600 | consumed tokens: 74239180800 | elapsed time per iteration (s): 0.68 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.621128E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.015 | TFLOPs: 22.75 | 31: iteration 141700/ 476837 | consumed samples: 36275200 | consumed tokens: 74291609600 | elapsed time per iteration (s): 0.68 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.620310E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.940 | TFLOPs: 22.74 | 31: iteration 141800/ 476837 | consumed samples: 36300800 | consumed tokens: 74344038400 | elapsed time per iteration (s): 0.68 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.620909E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.950 | TFLOPs: 22.74 | 31: iteration 141900/ 476837 | consumed samples: 36326400 | consumed tokens: 74396467200 | elapsed time per iteration (s): 0.68 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.625899E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.173 | TFLOPs: 22.76 | 0: [2023-04-27 00:24:43,979] [INFO] [logging.py:68:log_dist] [Rank 0] step=142000, skipped=0, lr=[0.00016500452980903118, 0.00016500452980903118, 0.00016500452980903118], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 142000/ 476837 | consumed samples: 36352000 | consumed tokens: 74448896000 | elapsed time per iteration (s): 0.68 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.628636E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.271 | TFLOPs: 22.76 | 0: steps: 142000 loss: 2.6281 iter time (s): 0.685 samples/sec: 373.962 31: iteration 142100/ 476837 | consumed samples: 36377600 | consumed tokens: 74501324800 | elapsed time per iteration (s): 0.68 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.623369E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.234 | TFLOPs: 22.76 | 31: iteration 142200/ 476837 | consumed samples: 36403200 | consumed tokens: 74553753600 | elapsed time per iteration (s): 0.68 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.622387E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.068 | TFLOPs: 22.75 | 31: iteration 142300/ 476837 | consumed samples: 36428800 | consumed tokens: 74606182400 | elapsed time per iteration (s): 0.68 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.622320E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.420 | TFLOPs: 22.77 | 31: iteration 142400/ 476837 | consumed samples: 36454400 | consumed tokens: 74658611200 | elapsed time per iteration (s): 0.68 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.619338E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.218 | TFLOPs: 22.64 | 31: iteration 142500/ 476837 | consumed samples: 36480000 | consumed tokens: 74711040000 | elapsed time per iteration (s): 0.68 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.623142E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.725 | TFLOPs: 22.73 | 31: iteration 142600/ 476837 | consumed samples: 36505600 | consumed tokens: 74763468800 | elapsed time per iteration (s): 0.68 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.618951E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.134 | TFLOPs: 22.76 | 31: iteration 142700/ 476837 | consumed samples: 36531200 | consumed tokens: 74815897600 | elapsed time per iteration (s): 0.68 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.618664E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.149 | TFLOPs: 22.76 | 31: iteration 142800/ 476837 | consumed samples: 36556800 | consumed tokens: 74868326400 | elapsed time per iteration (s): 0.68 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.624492E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.098 | TFLOPs: 22.75 | 31: iteration 142900/ 476837 | consumed samples: 36582400 | consumed tokens: 74920755200 | elapsed time per iteration (s): 0.68 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.618877E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.039 | TFLOPs: 22.75 | 31: iteration 143000/ 476837 | consumed samples: 36608000 | consumed tokens: 74973184000 | elapsed time per iteration (s): 0.68 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.622700E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.170 | TFLOPs: 22.76 | 31: iteration 143100/ 476837 | consumed samples: 36633600 | consumed tokens: 75025612800 | elapsed time per iteration (s): 0.68 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.621429E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.120 | TFLOPs: 22.75 | 31: iteration 143200/ 476837 | consumed samples: 36659200 | consumed tokens: 75078041600 | elapsed time per iteration (s): 0.68 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.617838E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.083 | TFLOPs: 22.75 | 31: iteration 143300/ 476837 | consumed samples: 36684800 | consumed tokens: 75130470400 | elapsed time per iteration (s): 0.68 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.622932E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.048 | TFLOPs: 22.69 | 31: iteration 143400/ 476837 | consumed samples: 36710400 | consumed tokens: 75182899200 | elapsed time per iteration (s): 0.68 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.619846E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.064 | TFLOPs: 22.75 | 31: iteration 143500/ 476837 | consumed samples: 36736000 | consumed tokens: 75235328000 | elapsed time per iteration (s): 0.68 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.616735E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.096 | TFLOPs: 22.75 | 31: iteration 143600/ 476837 | consumed samples: 36761600 | consumed tokens: 75287756800 | elapsed time per iteration (s): 0.68 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.613972E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.223 | TFLOPs: 22.76 | 31: iteration 143700/ 476837 | consumed samples: 36787200 | consumed tokens: 75340185600 | elapsed time per iteration (s): 0.68 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.623038E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.815 | TFLOPs: 22.61 | 31: iteration 143800/ 476837 | consumed samples: 36812800 | consumed tokens: 75392614400 | elapsed time per iteration (s): 0.68 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.620574E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.040 | TFLOPs: 22.75 | 31: iteration 143900/ 476837 | consumed samples: 36838400 | consumed tokens: 75445043200 | elapsed time per iteration (s): 0.68 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.639704E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.781 | TFLOPs: 22.73 | 0: [2023-04-27 00:47:26,281] [INFO] [logging.py:68:log_dist] [Rank 0] step=144000, skipped=0, lr=[0.00016405154835097527, 0.00016405154835097527, 0.00016405154835097527], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 144000/ 476837 | consumed samples: 36864000 | consumed tokens: 75497472000 | elapsed time per iteration (s): 0.68 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.635630E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.190 | TFLOPs: 22.76 | 0: steps: 144000 loss: 2.6474 iter time (s): 0.678 samples/sec: 377.718 31: iteration 144100/ 476837 | consumed samples: 36889600 | consumed tokens: 75549900800 | elapsed time per iteration (s): 0.68 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.622814E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.269 | TFLOPs: 22.76 | 31: iteration 144200/ 476837 | consumed samples: 36915200 | consumed tokens: 75602329600 | elapsed time per iteration (s): 0.68 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.622951E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.238 | TFLOPs: 22.76 | 31: iteration 144300/ 476837 | consumed samples: 36940800 | consumed tokens: 75654758400 | elapsed time per iteration (s): 0.68 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.623369E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.287 | TFLOPs: 22.76 | 31: iteration 144400/ 476837 | consumed samples: 36966400 | consumed tokens: 75707187200 | elapsed time per iteration (s): 0.68 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.613678E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.262 | TFLOPs: 22.76 | 31: iteration 144500/ 476837 | consumed samples: 36992000 | consumed tokens: 75759616000 | elapsed time per iteration (s): 0.68 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.616571E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.287 | TFLOPs: 22.76 | 31: iteration 144600/ 476837 | consumed samples: 37017600 | consumed tokens: 75812044800 | elapsed time per iteration (s): 0.68 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.615966E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.193 | TFLOPs: 22.76 | 31: iteration 144700/ 476837 | consumed samples: 37043200 | consumed tokens: 75864473600 | elapsed time per iteration (s): 0.68 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.623800E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.147 | TFLOPs: 22.76 | 31: iteration 144800/ 476837 | consumed samples: 37068800 | consumed tokens: 75916902400 | elapsed time per iteration (s): 0.68 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.620463E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.195 | TFLOPs: 22.76 | 31: iteration 144900/ 476837 | consumed samples: 37094400 | consumed tokens: 75969331200 | elapsed time per iteration (s): 0.68 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.619186E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.183 | TFLOPs: 22.76 | 31: iteration 145000/ 476837 | consumed samples: 37120000 | consumed tokens: 76021760000 | elapsed time per iteration (s): 0.69 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.619320E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.463 | TFLOPs: 22.59 | 31: iteration 145100/ 476837 | consumed samples: 37145600 | consumed tokens: 76074188800 | elapsed time per iteration (s): 0.68 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.622119E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.821 | TFLOPs: 22.74 | 31: iteration 145200/ 476837 | consumed samples: 37171200 | consumed tokens: 76126617600 | elapsed time per iteration (s): 0.68 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.615952E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.096 | TFLOPs: 22.75 | 31: iteration 145300/ 476837 | consumed samples: 37196800 | consumed tokens: 76179046400 | elapsed time per iteration (s): 0.68 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.618066E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.366 | TFLOPs: 22.77 | 31: iteration 145400/ 476837 | consumed samples: 37222400 | consumed tokens: 76231475200 | elapsed time per iteration (s): 0.68 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.614467E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.532 | TFLOPs: 22.78 | 31: iteration 145500/ 476837 | consumed samples: 37248000 | consumed tokens: 76283904000 | elapsed time per iteration (s): 0.68 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.616960E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.386 | TFLOPs: 22.77 | 31: iteration 145600/ 476837 | consumed samples: 37273600 | consumed tokens: 76336332800 | elapsed time per iteration (s): 0.69 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.623961E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.044 | TFLOPs: 22.39 | 31: iteration 145700/ 476837 | consumed samples: 37299200 | consumed tokens: 76388761600 | elapsed time per iteration (s): 0.69 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.617332E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.178 | TFLOPs: 22.58 | 31: iteration 145800/ 476837 | consumed samples: 37324800 | consumed tokens: 76441190400 | elapsed time per iteration (s): 0.69 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.615803E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.766 | TFLOPs: 22.37 | 31: iteration 145900/ 476837 | consumed samples: 37350400 | consumed tokens: 76493619200 | elapsed time per iteration (s): 0.68 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.615887E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.856 | TFLOPs: 22.62 | 0: [2023-04-27 01:10:10,923] [INFO] [logging.py:68:log_dist] [Rank 0] step=146000, skipped=0, lr=[0.00016308899162601018, 0.00016308899162601018, 0.00016308899162601018], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 146000/ 476837 | consumed samples: 37376000 | consumed tokens: 76546048000 | elapsed time per iteration (s): 0.68 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.617925E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.455 | TFLOPs: 22.77 | 0: steps: 146000 loss: 2.6173 iter time (s): 0.679 samples/sec: 376.988 31: iteration 146100/ 476837 | consumed samples: 37401600 | consumed tokens: 76598476800 | elapsed time per iteration (s): 0.68 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.619136E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.416 | TFLOPs: 22.77 | 31: iteration 146200/ 476837 | consumed samples: 37427200 | consumed tokens: 76650905600 | elapsed time per iteration (s): 0.68 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.618979E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.472 | TFLOPs: 22.78 | 31: iteration 146300/ 476837 | consumed samples: 37452800 | consumed tokens: 76703334400 | elapsed time per iteration (s): 0.69 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.612047E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.265 | TFLOPs: 22.58 | 31: iteration 146400/ 476837 | consumed samples: 37478400 | consumed tokens: 76755763200 | elapsed time per iteration (s): 0.68 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.615193E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.475 | TFLOPs: 22.78 | 31: iteration 146500/ 476837 | consumed samples: 37504000 | consumed tokens: 76808192000 | elapsed time per iteration (s): 0.68 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.618565E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.509 | TFLOPs: 22.78 | 31: iteration 146600/ 476837 | consumed samples: 37529600 | consumed tokens: 76860620800 | elapsed time per iteration (s): 0.68 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.619786E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.469 | TFLOPs: 22.78 | 31: iteration 146700/ 476837 | consumed samples: 37555200 | consumed tokens: 76913049600 | elapsed time per iteration (s): 0.68 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.614925E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.593 | TFLOPs: 22.78 | 31: iteration 146800/ 476837 | consumed samples: 37580800 | consumed tokens: 76965478400 | elapsed time per iteration (s): 0.68 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.616208E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.616 | TFLOPs: 22.78 | 31: iteration 146900/ 476837 | consumed samples: 37606400 | consumed tokens: 77017907200 | elapsed time per iteration (s): 0.68 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.622501E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.729 | TFLOPs: 22.79 | 31: iteration 147000/ 476837 | consumed samples: 37632000 | consumed tokens: 77070336000 | elapsed time per iteration (s): 0.68 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.618693E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.696 | TFLOPs: 22.79 | 31: iteration 147100/ 476837 | consumed samples: 37657600 | consumed tokens: 77122764800 | elapsed time per iteration (s): 0.68 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.616870E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.650 | TFLOPs: 22.79 | 31: iteration 147200/ 476837 | consumed samples: 37683200 | consumed tokens: 77175193600 | elapsed time per iteration (s): 0.68 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.615673E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.383 | TFLOPs: 22.77 | 31: iteration 147300/ 476837 | consumed samples: 37708800 | consumed tokens: 77227622400 | elapsed time per iteration (s): 0.68 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.615449E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.676 | TFLOPs: 22.79 | 31: iteration 147400/ 476837 | consumed samples: 37734400 | consumed tokens: 77280051200 | elapsed time per iteration (s): 0.68 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.617491E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.608 | TFLOPs: 22.78 | 31: iteration 147500/ 476837 | consumed samples: 37760000 | consumed tokens: 77332480000 | elapsed time per iteration (s): 0.68 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.613521E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.609 | TFLOPs: 22.78 | 31: iteration 147600/ 476837 | consumed samples: 37785600 | consumed tokens: 77384908800 | elapsed time per iteration (s): 0.69 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.617039E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.704 | TFLOPs: 22.61 | 31: iteration 147700/ 476837 | consumed samples: 37811200 | consumed tokens: 77437337600 | elapsed time per iteration (s): 0.68 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.615240E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.537 | TFLOPs: 22.78 | 31: iteration 147800/ 476837 | consumed samples: 37836800 | consumed tokens: 77489766400 | elapsed time per iteration (s): 0.68 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.615137E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.539 | TFLOPs: 22.78 | 31: iteration 147900/ 476837 | consumed samples: 37862400 | consumed tokens: 77542195200 | elapsed time per iteration (s): 0.68 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.617265E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.491 | TFLOPs: 22.78 | 0: [2023-04-27 01:32:51,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=148000, skipped=0, lr=[0.0001621170301516851, 0.0001621170301516851, 0.0001621170301516851], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 148000/ 476837 | consumed samples: 37888000 | consumed tokens: 77594624000 | elapsed time per iteration (s): 0.68 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.615589E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.473 | TFLOPs: 22.78 | 0: steps: 148000 loss: 2.6415 iter time (s): 0.677 samples/sec: 378.201 31: iteration 148100/ 476837 | consumed samples: 37913600 | consumed tokens: 77647052800 | elapsed time per iteration (s): 0.73 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.611913E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 351.991 | TFLOPs: 21.29 | 31: iteration 148200/ 476837 | consumed samples: 37939200 | consumed tokens: 77699481600 | elapsed time per iteration (s): 0.70 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.618203E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.480 | TFLOPs: 22.11 | 31: iteration 148300/ 476837 | consumed samples: 37964800 | consumed tokens: 77751910400 | elapsed time per iteration (s): 0.68 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.615227E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.484 | TFLOPs: 22.78 | 31: iteration 148400/ 476837 | consumed samples: 37990400 | consumed tokens: 77804339200 | elapsed time per iteration (s): 0.68 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.614022E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.454 | TFLOPs: 22.77 | 31: iteration 148500/ 476837 | consumed samples: 38016000 | consumed tokens: 77856768000 | elapsed time per iteration (s): 0.68 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.615800E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.513 | TFLOPs: 22.78 | 31: iteration 148600/ 476837 | consumed samples: 38041600 | consumed tokens: 77909196800 | elapsed time per iteration (s): 0.68 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.615254E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.521 | TFLOPs: 22.78 | 31: iteration 148700/ 476837 | consumed samples: 38067200 | consumed tokens: 77961625600 | elapsed time per iteration (s): 0.68 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.615108E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.424 | TFLOPs: 22.77 | 31: iteration 148800/ 476837 | consumed samples: 38092800 | consumed tokens: 78014054400 | elapsed time per iteration (s): 0.68 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.615536E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.491 | TFLOPs: 22.78 | 31: iteration 148900/ 476837 | consumed samples: 38118400 | consumed tokens: 78066483200 | elapsed time per iteration (s): 0.68 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.618054E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.143 | TFLOPs: 22.76 | 31: iteration 149000/ 476837 | consumed samples: 38144000 | consumed tokens: 78118912000 | elapsed time per iteration (s): 0.69 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.611770E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.687 | TFLOPs: 22.61 | 31: iteration 149100/ 476837 | consumed samples: 38169600 | consumed tokens: 78171340800 | elapsed time per iteration (s): 0.68 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.613175E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.426 | TFLOPs: 22.77 | 31: iteration 149200/ 476837 | consumed samples: 38195200 | consumed tokens: 78223769600 | elapsed time per iteration (s): 0.68 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.617241E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.485 | TFLOPs: 22.78 | 31: iteration 149300/ 476837 | consumed samples: 38220800 | consumed tokens: 78276198400 | elapsed time per iteration (s): 0.68 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.616371E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.457 | TFLOPs: 22.77 | 31: iteration 149400/ 476837 | consumed samples: 38246400 | consumed tokens: 78328627200 | elapsed time per iteration (s): 0.68 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.618661E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.458 | TFLOPs: 22.77 | 31: iteration 149500/ 476837 | consumed samples: 38272000 | consumed tokens: 78381056000 | elapsed time per iteration (s): 0.68 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.614429E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.496 | TFLOPs: 22.78 | 31: iteration 149600/ 476837 | consumed samples: 38297600 | consumed tokens: 78433484800 | elapsed time per iteration (s): 0.68 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.618521E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.410 | TFLOPs: 22.77 | 31: iteration 149700/ 476837 | consumed samples: 38323200 | consumed tokens: 78485913600 | elapsed time per iteration (s): 0.68 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.613307E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.291 | TFLOPs: 22.76 | 31: iteration 149800/ 476837 | consumed samples: 38348800 | consumed tokens: 78538342400 | elapsed time per iteration (s): 0.68 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.617917E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.232 | TFLOPs: 22.76 | 31: iteration 149900/ 476837 | consumed samples: 38374400 | consumed tokens: 78590771200 | elapsed time per iteration (s): 0.68 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.614839E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.311 | TFLOPs: 22.77 | 0: [2023-04-27 01:55:39,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=150000, skipped=0, lr=[0.00016113583611160673, 0.00016113583611160673, 0.00016113583611160673], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 150000/ 476837 | consumed samples: 38400000 | consumed tokens: 78643200000 | elapsed time per iteration (s): 0.68 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.610096E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.264 | TFLOPs: 22.76 | 0: steps: 150000 loss: 2.5787 iter time (s): 0.680 samples/sec: 376.396 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 150000 | lm loss value: 2.934979E+00 | lm loss PPL: 1.882111E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 150100/ 476837 | consumed samples: 38425600 | consumed tokens: 78695628800 | elapsed time per iteration (s): 0.69 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.610163E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.096 | TFLOPs: 22.45 | 31: iteration 150200/ 476837 | consumed samples: 38451200 | consumed tokens: 78748057600 | elapsed time per iteration (s): 0.68 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.615918E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.349 | TFLOPs: 22.77 | 31: iteration 150300/ 476837 | consumed samples: 38476800 | consumed tokens: 78800486400 | elapsed time per iteration (s): 0.68 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.611874E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.737 | TFLOPs: 22.61 | 31: iteration 150400/ 476837 | consumed samples: 38502400 | consumed tokens: 78852915200 | elapsed time per iteration (s): 0.68 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.609640E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.569 | TFLOPs: 22.78 | 31: iteration 150500/ 476837 | consumed samples: 38528000 | consumed tokens: 78905344000 | elapsed time per iteration (s): 0.68 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.614886E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.624 | TFLOPs: 22.78 | 31: iteration 150600/ 476837 | consumed samples: 38553600 | consumed tokens: 78957772800 | elapsed time per iteration (s): 0.68 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.614488E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.518 | TFLOPs: 22.78 | 31: iteration 150700/ 476837 | consumed samples: 38579200 | consumed tokens: 79010201600 | elapsed time per iteration (s): 0.68 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.614368E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.689 | TFLOPs: 22.79 | 31: iteration 150800/ 476837 | consumed samples: 38604800 | consumed tokens: 79062630400 | elapsed time per iteration (s): 0.68 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.615759E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.767 | TFLOPs: 22.79 | 31: iteration 150900/ 476837 | consumed samples: 38630400 | consumed tokens: 79115059200 | elapsed time per iteration (s): 0.68 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.610841E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.757 | TFLOPs: 22.79 | 31: iteration 151000/ 476837 | consumed samples: 38656000 | consumed tokens: 79167488000 | elapsed time per iteration (s): 0.68 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.614986E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.454 | TFLOPs: 22.77 | 31: iteration 151100/ 476837 | consumed samples: 38681600 | consumed tokens: 79219916800 | elapsed time per iteration (s): 0.68 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.613105E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.833 | TFLOPs: 22.80 | 31: iteration 151200/ 476837 | consumed samples: 38707200 | consumed tokens: 79272345600 | elapsed time per iteration (s): 0.68 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.614904E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.909 | TFLOPs: 22.80 | 31: iteration 151300/ 476837 | consumed samples: 38732800 | consumed tokens: 79324774400 | elapsed time per iteration (s): 0.68 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.611483E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.841 | TFLOPs: 22.68 | 31: iteration 151400/ 476837 | consumed samples: 38758400 | consumed tokens: 79377203200 | elapsed time per iteration (s): 0.69 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.616654E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.323 | TFLOPs: 22.52 | 31: iteration 151500/ 476837 | consumed samples: 38784000 | consumed tokens: 79429632000 | elapsed time per iteration (s): 0.69 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.615568E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.219 | TFLOPs: 22.52 | 31: iteration 151600/ 476837 | consumed samples: 38809600 | consumed tokens: 79482060800 | elapsed time per iteration (s): 0.68 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.613676E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.844 | TFLOPs: 22.68 | 31: iteration 151700/ 476837 | consumed samples: 38835200 | consumed tokens: 79534489600 | elapsed time per iteration (s): 0.68 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.615861E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.733 | TFLOPs: 22.61 | 31: iteration 151800/ 476837 | consumed samples: 38860800 | consumed tokens: 79586918400 | elapsed time per iteration (s): 0.68 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.611838E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.764 | TFLOPs: 22.79 | 31: iteration 151900/ 476837 | consumed samples: 38886400 | consumed tokens: 79639347200 | elapsed time per iteration (s): 0.68 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.611769E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.846 | TFLOPs: 22.80 | 0: [2023-04-27 02:18:23,442] [INFO] [logging.py:68:log_dist] [Rank 0] step=152000, skipped=0, lr=[0.00016014558332493682, 0.00016014558332493682, 0.00016014558332493682], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 152000/ 476837 | consumed samples: 38912000 | consumed tokens: 79691776000 | elapsed time per iteration (s): 0.69 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.622610E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.478 | TFLOPs: 22.59 | 0: steps: 152000 loss: 2.6480 iter time (s): 0.678 samples/sec: 377.553 31: iteration 152100/ 476837 | consumed samples: 38937600 | consumed tokens: 79744204800 | elapsed time per iteration (s): 0.68 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.618565E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.860 | TFLOPs: 22.80 | 31: iteration 152200/ 476837 | consumed samples: 38963200 | consumed tokens: 79796633600 | elapsed time per iteration (s): 0.68 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.611490E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.789 | TFLOPs: 22.79 | 31: iteration 152300/ 476837 | consumed samples: 38988800 | consumed tokens: 79849062400 | elapsed time per iteration (s): 0.68 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.616177E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.728 | TFLOPs: 22.79 | 31: iteration 152400/ 476837 | consumed samples: 39014400 | consumed tokens: 79901491200 | elapsed time per iteration (s): 0.68 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.617501E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.596 | TFLOPs: 22.78 | 31: iteration 152500/ 476837 | consumed samples: 39040000 | consumed tokens: 79953920000 | elapsed time per iteration (s): 0.68 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.619155E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.550 | TFLOPs: 22.78 | 31: iteration 152600/ 476837 | consumed samples: 39065600 | consumed tokens: 80006348800 | elapsed time per iteration (s): 0.68 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.614841E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.117 | TFLOPs: 22.69 | 31: iteration 152700/ 476837 | consumed samples: 39091200 | consumed tokens: 80058777600 | elapsed time per iteration (s): 0.68 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.616413E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.448 | TFLOPs: 22.77 | 31: iteration 152800/ 476837 | consumed samples: 39116800 | consumed tokens: 80111206400 | elapsed time per iteration (s): 0.68 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.608858E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.687 | TFLOPs: 22.79 | 31: iteration 152900/ 476837 | consumed samples: 39142400 | consumed tokens: 80163635200 | elapsed time per iteration (s): 0.68 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.613524E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.742 | TFLOPs: 22.79 | 31: iteration 153000/ 476837 | consumed samples: 39168000 | consumed tokens: 80216064000 | elapsed time per iteration (s): 0.69 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.611916E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.158 | TFLOPs: 22.58 | 31: iteration 153100/ 476837 | consumed samples: 39193600 | consumed tokens: 80268492800 | elapsed time per iteration (s): 0.68 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.608920E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.210 | TFLOPs: 22.76 | 31: iteration 153200/ 476837 | consumed samples: 39219200 | consumed tokens: 80320921600 | elapsed time per iteration (s): 0.68 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.611797E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.499 | TFLOPs: 22.78 | 31: iteration 153300/ 476837 | consumed samples: 39244800 | consumed tokens: 80373350400 | elapsed time per iteration (s): 0.68 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.613801E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.446 | TFLOPs: 22.77 | 31: iteration 153400/ 476837 | consumed samples: 39270400 | consumed tokens: 80425779200 | elapsed time per iteration (s): 0.68 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.611289E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.864 | TFLOPs: 22.74 | 31: iteration 153500/ 476837 | consumed samples: 39296000 | consumed tokens: 80478208000 | elapsed time per iteration (s): 0.68 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.612619E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.920 | TFLOPs: 22.74 | 31: iteration 153600/ 476837 | consumed samples: 39321600 | consumed tokens: 80530636800 | elapsed time per iteration (s): 0.68 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.611626E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.340 | TFLOPs: 22.77 | 31: iteration 153700/ 476837 | consumed samples: 39347200 | consumed tokens: 80583065600 | elapsed time per iteration (s): 0.68 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.610774E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.648 | TFLOPs: 22.79 | 31: iteration 153800/ 476837 | consumed samples: 39372800 | consumed tokens: 80635494400 | elapsed time per iteration (s): 0.68 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.612588E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.635 | TFLOPs: 22.79 | 31: iteration 153900/ 476837 | consumed samples: 39398400 | consumed tokens: 80687923200 | elapsed time per iteration (s): 0.68 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.612654E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.194 | TFLOPs: 22.76 | 0: [2023-04-27 02:41:04,245] [INFO] [logging.py:68:log_dist] [Rank 0] step=154000, skipped=0, lr=[0.0001591464472155999, 0.0001591464472155999, 0.0001591464472155999], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 154000/ 476837 | consumed samples: 39424000 | consumed tokens: 80740352000 | elapsed time per iteration (s): 0.68 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.614872E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.574 | TFLOPs: 22.78 | 0: steps: 154000 loss: 2.6056 iter time (s): 0.677 samples/sec: 378.202 31: iteration 154100/ 476837 | consumed samples: 39449600 | consumed tokens: 80792780800 | elapsed time per iteration (s): 0.68 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.612798E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.585 | TFLOPs: 22.78 | 31: iteration 154200/ 476837 | consumed samples: 39475200 | consumed tokens: 80845209600 | elapsed time per iteration (s): 0.68 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.614086E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.695 | TFLOPs: 22.79 | 31: iteration 154300/ 476837 | consumed samples: 39500800 | consumed tokens: 80897638400 | elapsed time per iteration (s): 0.68 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.610986E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.641 | TFLOPs: 22.79 | 31: iteration 154400/ 476837 | consumed samples: 39526400 | consumed tokens: 80950067200 | elapsed time per iteration (s): 0.69 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.607642E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.572 | TFLOPs: 22.60 | 31: iteration 154500/ 476837 | consumed samples: 39552000 | consumed tokens: 81002496000 | elapsed time per iteration (s): 0.68 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.609377E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.506 | TFLOPs: 22.78 | 31: iteration 154600/ 476837 | consumed samples: 39577600 | consumed tokens: 81054924800 | elapsed time per iteration (s): 0.68 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.610438E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.344 | TFLOPs: 22.77 | 31: iteration 154700/ 476837 | consumed samples: 39603200 | consumed tokens: 81107353600 | elapsed time per iteration (s): 0.68 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.611909E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.178 | TFLOPs: 22.76 | 31: iteration 154800/ 476837 | consumed samples: 39628800 | consumed tokens: 81159782400 | elapsed time per iteration (s): 0.68 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.611741E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.266 | TFLOPs: 22.76 | 31: iteration 154900/ 476837 | consumed samples: 39654400 | consumed tokens: 81212211200 | elapsed time per iteration (s): 0.68 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.612401E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.695 | TFLOPs: 22.79 | 31: iteration 155000/ 476837 | consumed samples: 39680000 | consumed tokens: 81264640000 | elapsed time per iteration (s): 0.68 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.614200E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.810 | TFLOPs: 22.80 | 31: iteration 155100/ 476837 | consumed samples: 39705600 | consumed tokens: 81317068800 | elapsed time per iteration (s): 0.68 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.607531E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.906 | TFLOPs: 22.80 | 31: iteration 155200/ 476837 | consumed samples: 39731200 | consumed tokens: 81369497600 | elapsed time per iteration (s): 0.68 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.610071E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.577 | TFLOPs: 22.78 | 31: iteration 155300/ 476837 | consumed samples: 39756800 | consumed tokens: 81421926400 | elapsed time per iteration (s): 0.68 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.608701E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.788 | TFLOPs: 22.79 | 31: iteration 155400/ 476837 | consumed samples: 39782400 | consumed tokens: 81474355200 | elapsed time per iteration (s): 0.68 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.609357E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.885 | TFLOPs: 22.80 | 31: iteration 155500/ 476837 | consumed samples: 39808000 | consumed tokens: 81526784000 | elapsed time per iteration (s): 0.68 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.612089E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.724 | TFLOPs: 22.79 | 31: iteration 155600/ 476837 | consumed samples: 39833600 | consumed tokens: 81579212800 | elapsed time per iteration (s): 0.68 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.607842E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.810 | TFLOPs: 22.80 | 31: iteration 155700/ 476837 | consumed samples: 39859200 | consumed tokens: 81631641600 | elapsed time per iteration (s): 0.68 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.606874E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.877 | TFLOPs: 22.80 | 31: iteration 155800/ 476837 | consumed samples: 39884800 | consumed tokens: 81684070400 | elapsed time per iteration (s): 0.69 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.606383E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.028 | TFLOPs: 22.57 | 31: iteration 155900/ 476837 | consumed samples: 39910400 | consumed tokens: 81736499200 | elapsed time per iteration (s): 0.68 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.607524E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.629 | TFLOPs: 22.79 | 0: [2023-04-27 03:03:44,829] [INFO] [logging.py:68:log_dist] [Rank 0] step=156000, skipped=0, lr=[0.0001581386047812069, 0.0001581386047812069, 0.0001581386047812069], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 156000/ 476837 | consumed samples: 39936000 | consumed tokens: 81788928000 | elapsed time per iteration (s): 0.68 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.610254E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.730 | TFLOPs: 22.79 | 0: steps: 156000 loss: 2.6087 iter time (s): 0.677 samples/sec: 378.230 31: iteration 156100/ 476837 | consumed samples: 39961600 | consumed tokens: 81841356800 | elapsed time per iteration (s): 0.68 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.612047E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.561 | TFLOPs: 22.78 | 31: iteration 156200/ 476837 | consumed samples: 39987200 | consumed tokens: 81893785600 | elapsed time per iteration (s): 0.68 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.612962E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.522 | TFLOPs: 22.78 | 31: iteration 156300/ 476837 | consumed samples: 40012800 | consumed tokens: 81946214400 | elapsed time per iteration (s): 0.68 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.610926E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.527 | TFLOPs: 22.78 | 31: iteration 156400/ 476837 | consumed samples: 40038400 | consumed tokens: 81998643200 | elapsed time per iteration (s): 0.68 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.609952E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.531 | TFLOPs: 22.78 | 31: iteration 156500/ 476837 | consumed samples: 40064000 | consumed tokens: 82051072000 | elapsed time per iteration (s): 0.68 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.608862E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.879 | TFLOPs: 22.74 | 31: iteration 156600/ 476837 | consumed samples: 40089600 | consumed tokens: 82103500800 | elapsed time per iteration (s): 0.68 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.612127E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.071 | TFLOPs: 22.63 | 31: iteration 156700/ 476837 | consumed samples: 40115200 | consumed tokens: 82155929600 | elapsed time per iteration (s): 0.68 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.614400E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.113 | TFLOPs: 22.63 | 31: iteration 156800/ 476837 | consumed samples: 40140800 | consumed tokens: 82208358400 | elapsed time per iteration (s): 0.68 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.608066E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.597 | TFLOPs: 22.78 | 31: iteration 156900/ 476837 | consumed samples: 40166400 | consumed tokens: 82260787200 | elapsed time per iteration (s): 0.68 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.601706E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.459 | TFLOPs: 22.77 | 31: iteration 157000/ 476837 | consumed samples: 40192000 | consumed tokens: 82313216000 | elapsed time per iteration (s): 0.68 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.608383E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.407 | TFLOPs: 22.77 | 31: iteration 157100/ 476837 | consumed samples: 40217600 | consumed tokens: 82365644800 | elapsed time per iteration (s): 0.69 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.608763E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.471 | TFLOPs: 22.47 | 31: iteration 157200/ 476837 | consumed samples: 40243200 | consumed tokens: 82418073600 | elapsed time per iteration (s): 0.69 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.603810E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.649 | TFLOPs: 22.42 | 31: iteration 157300/ 476837 | consumed samples: 40268800 | consumed tokens: 82470502400 | elapsed time per iteration (s): 0.69 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.604429E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.935 | TFLOPs: 22.50 | 31: iteration 157400/ 476837 | consumed samples: 40294400 | consumed tokens: 82522931200 | elapsed time per iteration (s): 0.68 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.605055E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.943 | TFLOPs: 22.68 | 31: iteration 157500/ 476837 | consumed samples: 40320000 | consumed tokens: 82575360000 | elapsed time per iteration (s): 0.68 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.608511E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.687 | TFLOPs: 22.79 | 31: iteration 157600/ 476837 | consumed samples: 40345600 | consumed tokens: 82627788800 | elapsed time per iteration (s): 0.68 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.609631E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.480 | TFLOPs: 22.78 | 31: iteration 157700/ 476837 | consumed samples: 40371200 | consumed tokens: 82680217600 | elapsed time per iteration (s): 0.68 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.608546E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.126 | TFLOPs: 22.75 | 31: iteration 157800/ 476837 | consumed samples: 40396800 | consumed tokens: 82732646400 | elapsed time per iteration (s): 0.69 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.603309E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.137 | TFLOPs: 22.57 | 31: iteration 157900/ 476837 | consumed samples: 40422400 | consumed tokens: 82785075200 | elapsed time per iteration (s): 0.68 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.609474E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.332 | TFLOPs: 22.77 | 0: [2023-04-27 03:26:29,549] [INFO] [logging.py:68:log_dist] [Rank 0] step=158000, skipped=0, lr=[0.00015712223456169992, 0.00015712223456169992, 0.00015712223456169992], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 158000/ 476837 | consumed samples: 40448000 | consumed tokens: 82837504000 | elapsed time per iteration (s): 0.68 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.605761E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.133 | TFLOPs: 22.76 | 0: steps: 158000 loss: 2.6263 iter time (s): 0.679 samples/sec: 377.044 31: iteration 158100/ 476837 | consumed samples: 40473600 | consumed tokens: 82889932800 | elapsed time per iteration (s): 0.68 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.604554E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.091 | TFLOPs: 22.75 | 31: iteration 158200/ 476837 | consumed samples: 40499200 | consumed tokens: 82942361600 | elapsed time per iteration (s): 0.68 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.607281E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.973 | TFLOPs: 22.75 | 31: iteration 158300/ 476837 | consumed samples: 40524800 | consumed tokens: 82994790400 | elapsed time per iteration (s): 0.68 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.605179E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.228 | TFLOPs: 22.70 | 31: iteration 158400/ 476837 | consumed samples: 40550400 | consumed tokens: 83047219200 | elapsed time per iteration (s): 0.68 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.603042E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.280 | TFLOPs: 22.70 | 31: iteration 158500/ 476837 | consumed samples: 40576000 | consumed tokens: 83099648000 | elapsed time per iteration (s): 0.68 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.606800E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.330 | TFLOPs: 22.71 | 31: iteration 158600/ 476837 | consumed samples: 40601600 | consumed tokens: 83152076800 | elapsed time per iteration (s): 0.68 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.607006E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.358 | TFLOPs: 22.71 | 31: iteration 158700/ 476837 | consumed samples: 40627200 | consumed tokens: 83204505600 | elapsed time per iteration (s): 0.69 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.603843E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.596 | TFLOPs: 22.42 | 31: iteration 158800/ 476837 | consumed samples: 40652800 | consumed tokens: 83256934400 | elapsed time per iteration (s): 0.69 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.607275E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.591 | TFLOPs: 22.54 | 31: iteration 158900/ 476837 | consumed samples: 40678400 | consumed tokens: 83309363200 | elapsed time per iteration (s): 0.68 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.606874E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.614 | TFLOPs: 22.72 | 31: iteration 159000/ 476837 | consumed samples: 40704000 | consumed tokens: 83361792000 | elapsed time per iteration (s): 0.68 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.606211E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.176 | TFLOPs: 22.70 | 31: iteration 159100/ 476837 | consumed samples: 40729600 | consumed tokens: 83414220800 | elapsed time per iteration (s): 0.68 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.607617E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.365 | TFLOPs: 22.71 | 31: iteration 159200/ 476837 | consumed samples: 40755200 | consumed tokens: 83466649600 | elapsed time per iteration (s): 0.68 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.601635E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.467 | TFLOPs: 22.71 | 31: iteration 159300/ 476837 | consumed samples: 40780800 | consumed tokens: 83519078400 | elapsed time per iteration (s): 0.68 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.603610E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.444 | TFLOPs: 22.71 | 31: iteration 159400/ 476837 | consumed samples: 40806400 | consumed tokens: 83571507200 | elapsed time per iteration (s): 0.68 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.608083E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.763 | TFLOPs: 22.73 | 31: iteration 159500/ 476837 | consumed samples: 40832000 | consumed tokens: 83623936000 | elapsed time per iteration (s): 0.68 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.603932E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.281 | TFLOPs: 22.76 | 31: iteration 159600/ 476837 | consumed samples: 40857600 | consumed tokens: 83676364800 | elapsed time per iteration (s): 0.68 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.607446E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.207 | TFLOPs: 22.76 | 31: iteration 159700/ 476837 | consumed samples: 40883200 | consumed tokens: 83728793600 | elapsed time per iteration (s): 0.68 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.606451E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.291 | TFLOPs: 22.76 | 31: iteration 159800/ 476837 | consumed samples: 40908800 | consumed tokens: 83781222400 | elapsed time per iteration (s): 0.68 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.604617E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.610 | TFLOPs: 22.78 | 31: iteration 159900/ 476837 | consumed samples: 40934400 | consumed tokens: 83833651200 | elapsed time per iteration (s): 0.68 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.608387E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.644 | TFLOPs: 22.79 | 0: [2023-04-27 03:49:13,445] [INFO] [logging.py:68:log_dist] [Rank 0] step=160000, skipped=0, lr=[0.0001560975166077236, 0.0001560975166077236, 0.0001560975166077236], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 160000/ 476837 | consumed samples: 40960000 | consumed tokens: 83886080000 | elapsed time per iteration (s): 0.68 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.605588E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.718 | TFLOPs: 22.79 | 0: steps: 160000 loss: 2.5988 iter time (s): 0.679 samples/sec: 377.277 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 160000 | lm loss value: 2.976902E+00 | lm loss PPL: 1.962691E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 160000 to checkpoints_1b1250b1b5 0: [2023-04-27 03:49:13,760] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step160000 is begin to save! 0: [2023-04-27 03:49:14,607] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_01-model_00-model_states.pt... 0: [2023-04-27 03:49:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_01-model_00-model_states.pt. 0: [2023-04-27 03:49:15,058] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_03-model_00-model_states.pt... 0: [2023-04-27 03:49:15,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_03-model_00-model_states.pt. 0: [2023-04-27 03:49:15,183] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_04-model_00-model_states.pt... 0: [2023-04-27 03:49:15,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_04-model_00-model_states.pt. 0: [2023-04-27 03:49:15,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_05-model_00-model_states.pt... 0: [2023-04-27 03:49:15,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_05-model_00-model_states.pt. 0: [2023-04-27 03:49:15,436] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_06-model_00-model_states.pt... 0: [2023-04-27 03:49:15,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_06-model_00-model_states.pt. 0: [2023-04-27 03:49:15,530] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_07-model_00-model_states.pt... 0: [2023-04-27 03:49:15,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_07-model_00-model_states.pt. 0: [2023-04-27 03:49:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_08-model_00-model_states.pt... 0: [2023-04-27 03:49:15,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_08-model_00-model_states.pt. 0: [2023-04-27 03:49:15,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_09-model_00-model_states.pt... 0: [2023-04-27 03:49:15,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_09-model_00-model_states.pt. 0: [2023-04-27 03:49:15,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_10-model_00-model_states.pt... 0: [2023-04-27 03:49:15,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_10-model_00-model_states.pt. 0: [2023-04-27 03:49:15,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_11-model_00-model_states.pt... 0: [2023-04-27 03:49:16,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_11-model_00-model_states.pt. 0: [2023-04-27 03:49:16,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_12-model_00-model_states.pt... 0: [2023-04-27 03:49:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_12-model_00-model_states.pt. 0: [2023-04-27 03:49:16,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_13-model_00-model_states.pt... 0: [2023-04-27 03:49:16,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_13-model_00-model_states.pt. 0: [2023-04-27 03:49:16,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_14-model_00-model_states.pt... 0: [2023-04-27 03:49:16,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_14-model_00-model_states.pt. 0: [2023-04-27 03:49:16,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_15-model_00-model_states.pt... 0: [2023-04-27 03:49:16,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_15-model_00-model_states.pt. 0: [2023-04-27 03:49:16,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_16-model_00-model_states.pt... 0: [2023-04-27 03:49:16,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_16-model_00-model_states.pt. 0: [2023-04-27 03:49:16,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_17-model_00-model_states.pt... 0: [2023-04-27 03:49:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_17-model_00-model_states.pt. 0: [2023-04-27 03:49:16,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_18-model_00-model_states.pt... 0: [2023-04-27 03:49:16,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_18-model_00-model_states.pt. 0: [2023-04-27 03:49:16,643] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_19-model_00-model_states.pt... 0: [2023-04-27 03:49:16,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_19-model_00-model_states.pt. 0: [2023-04-27 03:49:16,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_20-model_00-model_states.pt... 0: [2023-04-27 03:49:16,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_20-model_00-model_states.pt. 0: [2023-04-27 03:49:16,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_21-model_00-model_states.pt... 0: [2023-04-27 03:49:16,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_21-model_00-model_states.pt. 0: [2023-04-27 03:49:16,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_22-model_00-model_states.pt... 0: [2023-04-27 03:49:17,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_22-model_00-model_states.pt. 0: [2023-04-27 03:49:17,001] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_23-model_00-model_states.pt... 0: [2023-04-27 03:49:17,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_23-model_00-model_states.pt. 0: [2023-04-27 03:49:17,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_24-model_00-model_states.pt... 0: [2023-04-27 03:49:17,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_24-model_00-model_states.pt. 0: [2023-04-27 03:49:17,182] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_25-model_00-model_states.pt... 0: [2023-04-27 03:49:17,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_25-model_00-model_states.pt. 0: [2023-04-27 03:49:17,270] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_26-model_00-model_states.pt... 0: [2023-04-27 03:49:17,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_26-model_00-model_states.pt. 0: [2023-04-27 03:49:17,358] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_27-model_00-model_states.pt... 0: [2023-04-27 03:49:17,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_27-model_00-model_states.pt. 0: [2023-04-27 03:49:17,448] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_28-model_00-model_states.pt... 0: [2023-04-27 03:49:17,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_28-model_00-model_states.pt. 0: [2023-04-27 03:49:17,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/layer_30-model_00-model_states.pt... 0: [2023-04-27 03:49:17,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/layer_30-model_00-model_states.pt. 0: [2023-04-27 03:49:17,541] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step160000/mp_rank_00_model_states.pt 0: [2023-04-27 03:49:17,541] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/mp_rank_00_model_states.pt... 0: [2023-04-27 03:49:17,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/mp_rank_00_model_states.pt. 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 11: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 9: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 17: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 7: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 3: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 14: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 15: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 12: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 13: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 27: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 21: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 25: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 26: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 22: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 29: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 30: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 20: [2023-04-27 03:49:17,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 0: [2023-04-27 03:49:17,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-04-27 03:49:17,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,758] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,758] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-04-27 03:49:17,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-04-27 03:49:17,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-04-27 03:49:17,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-04-27 03:49:17,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-04-27 03:49:17,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 03:49:17,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 18: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 14: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 9: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-27 03:49:17,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-27 03:49:17,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 03:49:17,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-27 03:49:17,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 25: [2023-04-27 03:49:17,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 16: [2023-04-27 03:49:17,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-04-27 03:49:17,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-04-27 03:49:17,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-27 03:49:17,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-27 03:49:17,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 0: [2023-04-27 03:49:17,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-27 03:49:17,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 03:49:17,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-27 03:49:17,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 8: [2023-04-27 03:49:17,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-27 03:49:17,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 11: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 24: [2023-04-27 03:49:17,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 23: [2023-04-27 03:49:17,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-27 03:49:17,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-27 03:49:17,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-27 03:49:17,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-27 03:49:17,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-27 03:49:17,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 29: [2023-04-27 03:49:17,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 28: [2023-04-27 03:49:17,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 27: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-27 03:49:17,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 20: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 20: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 21: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 17: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-04-27 03:49:17,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 03:49:17,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-27 03:49:17,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 10: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 30: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 26: [2023-04-27 03:49:17,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 22: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 15: [2023-04-27 03:49:17,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-27 03:49:17,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-27 03:49:17,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-04-27 03:49:17,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-27 03:49:17,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-27 03:49:17,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-04-27 03:49:17,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-27 03:49:17,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-27 03:49:17,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-04-27 03:49:17,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-27 03:49:17,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-27 03:49:17,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 12: [2023-04-27 03:49:17,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-27 03:49:17,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-27 03:49:17,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 19: [2023-04-27 03:49:17,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,905] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,905] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-04-27 03:49:17,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 03:49:17,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-27 03:49:17,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 13: [2023-04-27 03:49:17,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 03:49:17,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step160000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-27 03:49:17,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: successfully saved checkpoint at iteration 160000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 4247.82 31: iteration 160100/ 476837 | consumed samples: 40985600 | consumed tokens: 83938508800 | elapsed time per iteration (s): 0.73 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.602725E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.148 | TFLOPs: 21.12 | 31: iteration 160200/ 476837 | consumed samples: 41011200 | consumed tokens: 83990937600 | elapsed time per iteration (s): 0.68 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.607726E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.055 | TFLOPs: 22.75 | 31: iteration 160300/ 476837 | consumed samples: 41036800 | consumed tokens: 84043366400 | elapsed time per iteration (s): 0.68 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.608197E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.836 | TFLOPs: 22.80 | 31: iteration 160400/ 476837 | consumed samples: 41062400 | consumed tokens: 84095795200 | elapsed time per iteration (s): 0.68 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.609529E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.921 | TFLOPs: 22.80 | 31: iteration 160500/ 476837 | consumed samples: 41088000 | consumed tokens: 84148224000 | elapsed time per iteration (s): 0.68 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.603528E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.822 | TFLOPs: 22.80 | 31: iteration 160600/ 476837 | consumed samples: 41113600 | consumed tokens: 84200652800 | elapsed time per iteration (s): 0.68 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.603640E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.615 | TFLOPs: 22.78 | 31: iteration 160700/ 476837 | consumed samples: 41139200 | consumed tokens: 84253081600 | elapsed time per iteration (s): 0.68 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.602422E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.650 | TFLOPs: 22.79 | 31: iteration 160800/ 476837 | consumed samples: 41164800 | consumed tokens: 84305510400 | elapsed time per iteration (s): 0.68 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.607407E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.636 | TFLOPs: 22.79 | 31: iteration 160900/ 476837 | consumed samples: 41190400 | consumed tokens: 84357939200 | elapsed time per iteration (s): 0.68 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.606390E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.519 | TFLOPs: 22.78 | 31: iteration 161000/ 476837 | consumed samples: 41216000 | consumed tokens: 84410368000 | elapsed time per iteration (s): 0.68 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.604827E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.588 | TFLOPs: 22.78 | 31: iteration 161100/ 476837 | consumed samples: 41241600 | consumed tokens: 84462796800 | elapsed time per iteration (s): 0.68 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.602999E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.294 | TFLOPs: 22.76 | 31: iteration 161200/ 476837 | consumed samples: 41267200 | consumed tokens: 84515225600 | elapsed time per iteration (s): 0.68 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.606679E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.332 | TFLOPs: 22.77 | 31: iteration 161300/ 476837 | consumed samples: 41292800 | consumed tokens: 84567654400 | elapsed time per iteration (s): 0.68 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.607410E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.210 | TFLOPs: 22.76 | 31: iteration 161400/ 476837 | consumed samples: 41318400 | consumed tokens: 84620083200 | elapsed time per iteration (s): 0.68 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.604185E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.138 | TFLOPs: 22.76 | 31: iteration 161500/ 476837 | consumed samples: 41344000 | consumed tokens: 84672512000 | elapsed time per iteration (s): 0.68 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.606908E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.969 | TFLOPs: 22.68 | 31: iteration 161600/ 476837 | consumed samples: 41369600 | consumed tokens: 84724940800 | elapsed time per iteration (s): 0.69 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.605166E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.486 | TFLOPs: 22.47 | 31: iteration 161700/ 476837 | consumed samples: 41395200 | consumed tokens: 84777369600 | elapsed time per iteration (s): 0.68 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.606429E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.630 | TFLOPs: 22.79 | 31: iteration 161800/ 476837 | consumed samples: 41420800 | consumed tokens: 84829798400 | elapsed time per iteration (s): 0.68 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.599398E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.525 | TFLOPs: 22.78 | 31: iteration 161900/ 476837 | consumed samples: 41446400 | consumed tokens: 84882227200 | elapsed time per iteration (s): 0.68 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.609046E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.587 | TFLOPs: 22.78 | 0: [2023-04-27 04:11:59,772] [INFO] [logging.py:68:log_dist] [Rank 0] step=162000, skipped=0, lr=[0.00015506463244872946, 0.00015506463244872946, 0.00015506463244872946], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 162000/ 476837 | consumed samples: 41472000 | consumed tokens: 84934656000 | elapsed time per iteration (s): 0.68 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.603453E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.617 | TFLOPs: 22.78 | 0: steps: 162000 loss: 2.6364 iter time (s): 0.678 samples/sec: 377.402 31: iteration 162100/ 476837 | consumed samples: 41497600 | consumed tokens: 84987084800 | elapsed time per iteration (s): 0.68 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.602930E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.680 | TFLOPs: 22.79 | 31: iteration 162200/ 476837 | consumed samples: 41523200 | consumed tokens: 85039513600 | elapsed time per iteration (s): 0.68 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.604831E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.683 | TFLOPs: 22.79 | 31: iteration 162300/ 476837 | consumed samples: 41548800 | consumed tokens: 85091942400 | elapsed time per iteration (s): 0.68 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.604821E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.744 | TFLOPs: 22.79 | 31: iteration 162400/ 476837 | consumed samples: 41574400 | consumed tokens: 85144371200 | elapsed time per iteration (s): 0.68 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.600395E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.826 | TFLOPs: 22.80 | 31: iteration 162500/ 476837 | consumed samples: 41600000 | consumed tokens: 85196800000 | elapsed time per iteration (s): 0.68 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.603814E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.876 | TFLOPs: 22.80 | 31: iteration 162600/ 476837 | consumed samples: 41625600 | consumed tokens: 85249228800 | elapsed time per iteration (s): 0.68 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.601992E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.744 | TFLOPs: 22.79 | 31: iteration 162700/ 476837 | consumed samples: 41651200 | consumed tokens: 85301657600 | elapsed time per iteration (s): 0.68 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.601803E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.909 | TFLOPs: 22.80 | 31: iteration 162800/ 476837 | consumed samples: 41676800 | consumed tokens: 85354086400 | elapsed time per iteration (s): 0.69 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.601678E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.584 | TFLOPs: 22.36 | 31: iteration 162900/ 476837 | consumed samples: 41702400 | consumed tokens: 85406515200 | elapsed time per iteration (s): 0.69 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.605889E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.386 | TFLOPs: 22.59 | 31: iteration 163000/ 476837 | consumed samples: 41728000 | consumed tokens: 85458944000 | elapsed time per iteration (s): 0.68 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.602724E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.678 | TFLOPs: 22.67 | 31: iteration 163100/ 476837 | consumed samples: 41753600 | consumed tokens: 85511372800 | elapsed time per iteration (s): 0.69 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.600424E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.743 | TFLOPs: 22.31 | 31: iteration 163200/ 476837 | consumed samples: 41779200 | consumed tokens: 85563801600 | elapsed time per iteration (s): 0.68 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.601007E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.940 | TFLOPs: 22.80 | 31: iteration 163300/ 476837 | consumed samples: 41804800 | consumed tokens: 85616230400 | elapsed time per iteration (s): 0.68 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.601821E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.629 | TFLOPs: 22.79 | 31: iteration 163400/ 476837 | consumed samples: 41830400 | consumed tokens: 85668659200 | elapsed time per iteration (s): 0.68 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.602242E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.528 | TFLOPs: 22.78 | 31: iteration 163500/ 476837 | consumed samples: 41856000 | consumed tokens: 85721088000 | elapsed time per iteration (s): 0.68 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.596479E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.678 | TFLOPs: 22.67 | 31: iteration 163600/ 476837 | consumed samples: 41881600 | consumed tokens: 85773516800 | elapsed time per iteration (s): 0.68 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.603796E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.508 | TFLOPs: 22.78 | 31: iteration 163700/ 476837 | consumed samples: 41907200 | consumed tokens: 85825945600 | elapsed time per iteration (s): 0.68 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.600989E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.652 | TFLOPs: 22.79 | 31: iteration 163800/ 476837 | consumed samples: 41932800 | consumed tokens: 85878374400 | elapsed time per iteration (s): 0.68 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.602886E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.722 | TFLOPs: 22.79 | 31: iteration 163900/ 476837 | consumed samples: 41958400 | consumed tokens: 85930803200 | elapsed time per iteration (s): 0.68 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.600090E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.740 | TFLOPs: 22.79 | 0: [2023-04-27 04:34:43,054] [INFO] [logging.py:68:log_dist] [Rank 0] step=164000, skipped=0, lr=[0.00015402376506081723, 0.00015402376506081723, 0.00015402376506081723], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 164000/ 476837 | consumed samples: 41984000 | consumed tokens: 85983232000 | elapsed time per iteration (s): 0.68 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.600249E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.331 | TFLOPs: 22.77 | 0: steps: 164000 loss: 2.6150 iter time (s): 0.678 samples/sec: 377.442 31: iteration 164100/ 476837 | consumed samples: 42009600 | consumed tokens: 86035660800 | elapsed time per iteration (s): 0.68 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.602081E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.802 | TFLOPs: 22.80 | 31: iteration 164200/ 476837 | consumed samples: 42035200 | consumed tokens: 86088089600 | elapsed time per iteration (s): 0.68 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.601340E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.774 | TFLOPs: 22.79 | 31: iteration 164300/ 476837 | consumed samples: 42060800 | consumed tokens: 86140518400 | elapsed time per iteration (s): 0.68 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.600301E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.892 | TFLOPs: 22.80 | 31: iteration 164400/ 476837 | consumed samples: 42086400 | consumed tokens: 86192947200 | elapsed time per iteration (s): 0.68 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.598838E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.964 | TFLOPs: 22.68 | 31: iteration 164500/ 476837 | consumed samples: 42112000 | consumed tokens: 86245376000 | elapsed time per iteration (s): 0.68 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.600092E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.582 | TFLOPs: 22.72 | 31: iteration 164600/ 476837 | consumed samples: 42137600 | consumed tokens: 86297804800 | elapsed time per iteration (s): 0.69 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.604039E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.556 | TFLOPs: 22.42 | 31: iteration 164700/ 476837 | consumed samples: 42163200 | consumed tokens: 86350233600 | elapsed time per iteration (s): 0.68 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.599770E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.879 | TFLOPs: 22.80 | 31: iteration 164800/ 476837 | consumed samples: 42188800 | consumed tokens: 86402662400 | elapsed time per iteration (s): 0.68 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.596826E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.785 | TFLOPs: 22.79 | 31: iteration 164900/ 476837 | consumed samples: 42214400 | consumed tokens: 86455091200 | elapsed time per iteration (s): 0.68 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.600645E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.734 | TFLOPs: 22.79 | 31: iteration 165000/ 476837 | consumed samples: 42240000 | consumed tokens: 86507520000 | elapsed time per iteration (s): 0.68 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.599388E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.815 | TFLOPs: 22.80 | 31: iteration 165100/ 476837 | consumed samples: 42265600 | consumed tokens: 86559948800 | elapsed time per iteration (s): 0.68 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.595312E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.801 | TFLOPs: 22.80 | 31: iteration 165200/ 476837 | consumed samples: 42291200 | consumed tokens: 86612377600 | elapsed time per iteration (s): 0.68 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.603117E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.692 | TFLOPs: 22.79 | 31: iteration 165300/ 476837 | consumed samples: 42316800 | consumed tokens: 86664806400 | elapsed time per iteration (s): 0.68 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.604242E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.143 | TFLOPs: 22.76 | 31: iteration 165400/ 476837 | consumed samples: 42342400 | consumed tokens: 86717235200 | elapsed time per iteration (s): 0.68 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.600296E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.834 | TFLOPs: 22.80 | 31: iteration 165500/ 476837 | consumed samples: 42368000 | consumed tokens: 86769664000 | elapsed time per iteration (s): 0.68 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.599885E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.909 | TFLOPs: 22.80 | 31: iteration 165600/ 476837 | consumed samples: 42393600 | consumed tokens: 86822092800 | elapsed time per iteration (s): 0.68 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.600802E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.913 | TFLOPs: 22.80 | 31: iteration 165700/ 476837 | consumed samples: 42419200 | consumed tokens: 86874521600 | elapsed time per iteration (s): 0.68 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.600480E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.597 | TFLOPs: 22.78 | 31: iteration 165800/ 476837 | consumed samples: 42444800 | consumed tokens: 86926950400 | elapsed time per iteration (s): 0.68 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.603184E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.814 | TFLOPs: 22.74 | 31: iteration 165900/ 476837 | consumed samples: 42470400 | consumed tokens: 86979379200 | elapsed time per iteration (s): 0.68 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.595626E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.141 | TFLOPs: 22.70 | 0: [2023-04-27 04:57:24,112] [INFO] [logging.py:68:log_dist] [Rank 0] step=166000, skipped=0, lr=[0.00015297509883432098, 0.00015297509883432098, 0.00015297509883432098], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 166000/ 476837 | consumed samples: 42496000 | consumed tokens: 87031808000 | elapsed time per iteration (s): 0.68 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.599499E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.044 | TFLOPs: 22.81 | 0: steps: 166000 loss: 2.6080 iter time (s): 0.677 samples/sec: 378.140 31: iteration 166100/ 476837 | consumed samples: 42521600 | consumed tokens: 87084236800 | elapsed time per iteration (s): 0.69 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.601825E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.173 | TFLOPs: 22.39 | 31: iteration 166200/ 476837 | consumed samples: 42547200 | consumed tokens: 87136665600 | elapsed time per iteration (s): 0.68 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.598266E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.018 | TFLOPs: 22.81 | 31: iteration 166300/ 476837 | consumed samples: 42572800 | consumed tokens: 87189094400 | elapsed time per iteration (s): 0.68 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.601181E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.007 | TFLOPs: 22.81 | 31: iteration 166400/ 476837 | consumed samples: 42598400 | consumed tokens: 87241523200 | elapsed time per iteration (s): 0.68 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.597808E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.955 | TFLOPs: 22.80 | 31: iteration 166500/ 476837 | consumed samples: 42624000 | consumed tokens: 87293952000 | elapsed time per iteration (s): 0.68 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.596586E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.948 | TFLOPs: 22.80 | 31: iteration 166600/ 476837 | consumed samples: 42649600 | consumed tokens: 87346380800 | elapsed time per iteration (s): 0.68 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.596645E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.717 | TFLOPs: 22.79 | 31: iteration 166700/ 476837 | consumed samples: 42675200 | consumed tokens: 87398809600 | elapsed time per iteration (s): 0.68 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.599445E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.021 | TFLOPs: 22.81 | 31: iteration 166800/ 476837 | consumed samples: 42700800 | consumed tokens: 87451238400 | elapsed time per iteration (s): 0.68 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.597933E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.962 | TFLOPs: 22.81 | 31: iteration 166900/ 476837 | consumed samples: 42726400 | consumed tokens: 87503667200 | elapsed time per iteration (s): 0.68 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.596985E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.924 | TFLOPs: 22.80 | 31: iteration 167000/ 476837 | consumed samples: 42752000 | consumed tokens: 87556096000 | elapsed time per iteration (s): 0.68 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.597977E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.890 | TFLOPs: 22.80 | 31: iteration 167100/ 476837 | consumed samples: 42777600 | consumed tokens: 87608524800 | elapsed time per iteration (s): 0.68 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.601091E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.908 | TFLOPs: 22.80 | 31: iteration 167200/ 476837 | consumed samples: 42803200 | consumed tokens: 87660953600 | elapsed time per iteration (s): 0.68 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.601434E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.329 | TFLOPs: 22.77 | 31: iteration 167300/ 476837 | consumed samples: 42828800 | consumed tokens: 87713382400 | elapsed time per iteration (s): 0.68 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.606549E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.785 | TFLOPs: 22.73 | 31: iteration 167400/ 476837 | consumed samples: 42854400 | consumed tokens: 87765811200 | elapsed time per iteration (s): 0.68 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.604041E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.861 | TFLOPs: 22.80 | 31: iteration 167500/ 476837 | consumed samples: 42880000 | consumed tokens: 87818240000 | elapsed time per iteration (s): 0.68 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.602731E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.903 | TFLOPs: 22.80 | 31: iteration 167600/ 476837 | consumed samples: 42905600 | consumed tokens: 87870668800 | elapsed time per iteration (s): 0.69 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.596558E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.751 | TFLOPs: 22.43 | 31: iteration 167700/ 476837 | consumed samples: 42931200 | consumed tokens: 87923097600 | elapsed time per iteration (s): 0.68 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.599231E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.772 | TFLOPs: 22.79 | 31: iteration 167800/ 476837 | consumed samples: 42956800 | consumed tokens: 87975526400 | elapsed time per iteration (s): 0.68 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.597401E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.553 | TFLOPs: 22.78 | 31: iteration 167900/ 476837 | consumed samples: 42982400 | consumed tokens: 88027955200 | elapsed time per iteration (s): 0.68 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.604001E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.807 | TFLOPs: 22.80 | 0: [2023-04-27 05:20:05,282] [INFO] [logging.py:68:log_dist] [Rank 0] step=168000, skipped=0, lr=[0.00015191881954114409, 0.00015191881954114409, 0.00015191881954114409], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 168000/ 476837 | consumed samples: 43008000 | consumed tokens: 88080384000 | elapsed time per iteration (s): 0.68 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.598723E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.860 | TFLOPs: 22.80 | 0: steps: 168000 loss: 2.6129 iter time (s): 0.677 samples/sec: 377.922 31: iteration 168100/ 476837 | consumed samples: 43033600 | consumed tokens: 88132812800 | elapsed time per iteration (s): 0.68 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.594955E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.732 | TFLOPs: 22.79 | 31: iteration 168200/ 476837 | consumed samples: 43059200 | consumed tokens: 88185241600 | elapsed time per iteration (s): 0.68 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.596953E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.816 | TFLOPs: 22.80 | 31: iteration 168300/ 476837 | consumed samples: 43084800 | consumed tokens: 88237670400 | elapsed time per iteration (s): 0.68 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.597099E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.759 | TFLOPs: 22.79 | 31: iteration 168400/ 476837 | consumed samples: 43110400 | consumed tokens: 88290099200 | elapsed time per iteration (s): 0.68 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.601516E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.753 | TFLOPs: 22.79 | 31: iteration 168500/ 476837 | consumed samples: 43136000 | consumed tokens: 88342528000 | elapsed time per iteration (s): 0.69 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.597350E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.407 | TFLOPs: 22.29 | 31: iteration 168600/ 476837 | consumed samples: 43161600 | consumed tokens: 88394956800 | elapsed time per iteration (s): 0.68 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.598882E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.438 | TFLOPs: 22.65 | 31: iteration 168700/ 476837 | consumed samples: 43187200 | consumed tokens: 88447385600 | elapsed time per iteration (s): 0.68 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.602219E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.957 | TFLOPs: 22.62 | 31: iteration 168800/ 476837 | consumed samples: 43212800 | consumed tokens: 88499814400 | elapsed time per iteration (s): 0.69 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.594951E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.848 | TFLOPs: 22.56 | 31: iteration 168900/ 476837 | consumed samples: 43238400 | consumed tokens: 88552243200 | elapsed time per iteration (s): 0.68 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.592529E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.936 | TFLOPs: 22.80 | 31: iteration 169000/ 476837 | consumed samples: 43264000 | consumed tokens: 88604672000 | elapsed time per iteration (s): 0.68 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.598314E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.677 | TFLOPs: 22.79 | 31: iteration 169100/ 476837 | consumed samples: 43289600 | consumed tokens: 88657100800 | elapsed time per iteration (s): 0.68 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.597876E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.205 | TFLOPs: 22.76 | 31: iteration 169200/ 476837 | consumed samples: 43315200 | consumed tokens: 88709529600 | elapsed time per iteration (s): 0.69 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.596908E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.596 | TFLOPs: 22.30 | 31: iteration 169300/ 476837 | consumed samples: 43340800 | consumed tokens: 88761958400 | elapsed time per iteration (s): 0.68 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.598108E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.556 | TFLOPs: 22.78 | 31: iteration 169400/ 476837 | consumed samples: 43366400 | consumed tokens: 88814387200 | elapsed time per iteration (s): 0.68 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.599221E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.610 | TFLOPs: 22.78 | 31: iteration 169500/ 476837 | consumed samples: 43392000 | consumed tokens: 88866816000 | elapsed time per iteration (s): 0.68 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.598528E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.658 | TFLOPs: 22.79 | 31: iteration 169600/ 476837 | consumed samples: 43417600 | consumed tokens: 88919244800 | elapsed time per iteration (s): 0.68 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.602961E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.644 | TFLOPs: 22.79 | 31: iteration 169700/ 476837 | consumed samples: 43443200 | consumed tokens: 88971673600 | elapsed time per iteration (s): 0.68 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.598464E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.730 | TFLOPs: 22.79 | 31: iteration 169800/ 476837 | consumed samples: 43468800 | consumed tokens: 89024102400 | elapsed time per iteration (s): 0.69 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.599422E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.286 | TFLOPs: 22.58 | 31: iteration 169900/ 476837 | consumed samples: 43494400 | consumed tokens: 89076531200 | elapsed time per iteration (s): 0.68 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.599046E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.894 | TFLOPs: 22.80 | 0: [2023-04-27 05:42:49,693] [INFO] [logging.py:68:log_dist] [Rank 0] step=170000, skipped=0, lr=[0.00015085511430184965, 0.00015085511430184965, 0.00015085511430184965], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 170000/ 476837 | consumed samples: 43520000 | consumed tokens: 89128960000 | elapsed time per iteration (s): 0.68 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.596548E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.928 | TFLOPs: 22.80 | 0: steps: 170000 loss: 2.5916 iter time (s): 0.679 samples/sec: 377.176 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 170000 | lm loss value: 2.905615E+00 | lm loss PPL: 1.827648E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 170100/ 476837 | consumed samples: 43545600 | consumed tokens: 89181388800 | elapsed time per iteration (s): 0.68 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.595102E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.860 | TFLOPs: 22.68 | 31: iteration 170200/ 476837 | consumed samples: 43571200 | consumed tokens: 89233817600 | elapsed time per iteration (s): 0.68 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.597339E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.591 | TFLOPs: 22.72 | 31: iteration 170300/ 476837 | consumed samples: 43596800 | consumed tokens: 89286246400 | elapsed time per iteration (s): 0.68 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.594574E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.551 | TFLOPs: 22.72 | 31: iteration 170400/ 476837 | consumed samples: 43622400 | consumed tokens: 89338675200 | elapsed time per iteration (s): 0.68 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.598135E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.847 | TFLOPs: 22.74 | 31: iteration 170500/ 476837 | consumed samples: 43648000 | consumed tokens: 89391104000 | elapsed time per iteration (s): 0.68 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.596451E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.812 | TFLOPs: 22.80 | 31: iteration 170600/ 476837 | consumed samples: 43673600 | consumed tokens: 89443532800 | elapsed time per iteration (s): 0.68 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.595816E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.832 | TFLOPs: 22.80 | 31: iteration 170700/ 476837 | consumed samples: 43699200 | consumed tokens: 89495961600 | elapsed time per iteration (s): 0.69 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.601569E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.526 | TFLOPs: 22.36 | 31: iteration 170800/ 476837 | consumed samples: 43724800 | consumed tokens: 89548390400 | elapsed time per iteration (s): 0.68 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.594719E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.875 | TFLOPs: 22.80 | 31: iteration 170900/ 476837 | consumed samples: 43750400 | consumed tokens: 89600819200 | elapsed time per iteration (s): 0.68 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.596044E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.898 | TFLOPs: 22.80 | 31: iteration 171000/ 476837 | consumed samples: 43776000 | consumed tokens: 89653248000 | elapsed time per iteration (s): 0.68 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.594895E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.920 | TFLOPs: 22.80 | 31: iteration 171100/ 476837 | consumed samples: 43801600 | consumed tokens: 89705676800 | elapsed time per iteration (s): 0.68 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.594434E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.942 | TFLOPs: 22.80 | 31: iteration 171200/ 476837 | consumed samples: 43827200 | consumed tokens: 89758105600 | elapsed time per iteration (s): 0.68 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.595253E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.912 | TFLOPs: 22.80 | 31: iteration 171300/ 476837 | consumed samples: 43852800 | consumed tokens: 89810534400 | elapsed time per iteration (s): 0.68 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.595141E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.994 | TFLOPs: 22.81 | 31: iteration 171400/ 476837 | consumed samples: 43878400 | consumed tokens: 89862963200 | elapsed time per iteration (s): 0.68 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.596224E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.692 | TFLOPs: 22.79 | 31: iteration 171500/ 476837 | consumed samples: 43904000 | consumed tokens: 89915392000 | elapsed time per iteration (s): 0.68 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.593856E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.941 | TFLOPs: 22.80 | 31: iteration 171600/ 476837 | consumed samples: 43929600 | consumed tokens: 89967820800 | elapsed time per iteration (s): 0.68 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.591961E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.388 | TFLOPs: 22.77 | 31: iteration 171700/ 476837 | consumed samples: 43955200 | consumed tokens: 90020249600 | elapsed time per iteration (s): 0.68 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.595560E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.896 | TFLOPs: 22.80 | 31: iteration 171800/ 476837 | consumed samples: 43980800 | consumed tokens: 90072678400 | elapsed time per iteration (s): 0.68 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.593669E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.747 | TFLOPs: 22.67 | 31: iteration 171900/ 476837 | consumed samples: 44006400 | consumed tokens: 90125107200 | elapsed time per iteration (s): 0.68 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.593173E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.060 | TFLOPs: 22.81 | 0: [2023-04-27 06:05:31,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=172000, skipped=0, lr=[0.00014978417155251194, 0.00014978417155251194, 0.00014978417155251194], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 172000/ 476837 | consumed samples: 44032000 | consumed tokens: 90177536000 | elapsed time per iteration (s): 0.68 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.591845E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.047 | TFLOPs: 22.81 | 0: steps: 172000 loss: 2.5550 iter time (s): 0.677 samples/sec: 378.064 31: iteration 172100/ 476837 | consumed samples: 44057600 | consumed tokens: 90229964800 | elapsed time per iteration (s): 0.68 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.597864E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.035 | TFLOPs: 22.81 | 31: iteration 172200/ 476837 | consumed samples: 44083200 | consumed tokens: 90282393600 | elapsed time per iteration (s): 0.68 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.594340E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.053 | TFLOPs: 22.81 | 31: iteration 172300/ 476837 | consumed samples: 44108800 | consumed tokens: 90334822400 | elapsed time per iteration (s): 0.69 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.593847E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.910 | TFLOPs: 22.38 | 31: iteration 172400/ 476837 | consumed samples: 44134400 | consumed tokens: 90387251200 | elapsed time per iteration (s): 0.68 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.588269E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.958 | TFLOPs: 22.80 | 31: iteration 172500/ 476837 | consumed samples: 44160000 | consumed tokens: 90439680000 | elapsed time per iteration (s): 0.68 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.594461E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.051 | TFLOPs: 22.81 | 31: iteration 172600/ 476837 | consumed samples: 44185600 | consumed tokens: 90492108800 | elapsed time per iteration (s): 0.68 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.592577E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.032 | TFLOPs: 22.81 | 31: iteration 172700/ 476837 | consumed samples: 44211200 | consumed tokens: 90544537600 | elapsed time per iteration (s): 0.68 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.597279E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.961 | TFLOPs: 22.81 | 31: iteration 172800/ 476837 | consumed samples: 44236800 | consumed tokens: 90596966400 | elapsed time per iteration (s): 0.68 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.594433E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.855 | TFLOPs: 22.80 | 31: iteration 172900/ 476837 | consumed samples: 44262400 | consumed tokens: 90649395200 | elapsed time per iteration (s): 0.68 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.595346E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.931 | TFLOPs: 22.74 | 31: iteration 173000/ 476837 | consumed samples: 44288000 | consumed tokens: 90701824000 | elapsed time per iteration (s): 0.68 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.594318E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.247 | TFLOPs: 22.76 | 31: iteration 173100/ 476837 | consumed samples: 44313600 | consumed tokens: 90754252800 | elapsed time per iteration (s): 0.68 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.590778E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.985 | TFLOPs: 22.81 | 31: iteration 173200/ 476837 | consumed samples: 44339200 | consumed tokens: 90806681600 | elapsed time per iteration (s): 0.68 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.596161E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.899 | TFLOPs: 22.80 | 31: iteration 173300/ 476837 | consumed samples: 44364800 | consumed tokens: 90859110400 | elapsed time per iteration (s): 0.68 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.596211E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.877 | TFLOPs: 22.80 | 31: iteration 173400/ 476837 | consumed samples: 44390400 | consumed tokens: 90911539200 | elapsed time per iteration (s): 0.68 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.594012E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.860 | TFLOPs: 22.80 | 31: iteration 173500/ 476837 | consumed samples: 44416000 | consumed tokens: 90963968000 | elapsed time per iteration (s): 0.68 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.590821E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.866 | TFLOPs: 22.80 | 31: iteration 173600/ 476837 | consumed samples: 44441600 | consumed tokens: 91016396800 | elapsed time per iteration (s): 0.68 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.593326E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.912 | TFLOPs: 22.80 | 31: iteration 173700/ 476837 | consumed samples: 44467200 | consumed tokens: 91068825600 | elapsed time per iteration (s): 0.68 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.605963E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.844 | TFLOPs: 22.80 | 31: iteration 173800/ 476837 | consumed samples: 44492800 | consumed tokens: 91121254400 | elapsed time per iteration (s): 0.77 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.594498E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 334.365 | TFLOPs: 20.23 | 31: iteration 173900/ 476837 | consumed samples: 44518400 | consumed tokens: 91173683200 | elapsed time per iteration (s): 0.85 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.595845E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 301.548 | TFLOPs: 18.24 | 0: [2023-04-27 06:28:36,542] [INFO] [logging.py:68:log_dist] [Rank 0] step=174000, skipped=0, lr=[0.00014870618101133477, 0.00014870618101133477, 0.00014870618101133477], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 174000/ 476837 | consumed samples: 44544000 | consumed tokens: 91226112000 | elapsed time per iteration (s): 0.68 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.594576E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.863 | TFLOPs: 22.80 | 0: steps: 174000 loss: 2.5520 iter time (s): 0.689 samples/sec: 371.404 31: iteration 174100/ 476837 | consumed samples: 44569600 | consumed tokens: 91278540800 | elapsed time per iteration (s): 0.68 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.593912E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.421 | TFLOPs: 22.77 | 31: iteration 174200/ 476837 | consumed samples: 44595200 | consumed tokens: 91330969600 | elapsed time per iteration (s): 0.69 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.594336E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.110 | TFLOPs: 22.33 | 31: iteration 174300/ 476837 | consumed samples: 44620800 | consumed tokens: 91383398400 | elapsed time per iteration (s): 0.69 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.593893E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.171 | TFLOPs: 22.58 | 31: iteration 174400/ 476837 | consumed samples: 44646400 | consumed tokens: 91435827200 | elapsed time per iteration (s): 0.69 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.589983E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.417 | TFLOPs: 22.29 | 31: iteration 174500/ 476837 | consumed samples: 44672000 | consumed tokens: 91488256000 | elapsed time per iteration (s): 0.69 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.591273E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.762 | TFLOPs: 22.31 | 31: iteration 174600/ 476837 | consumed samples: 44697600 | consumed tokens: 91540684800 | elapsed time per iteration (s): 0.68 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.596262E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.839 | TFLOPs: 22.80 | 31: iteration 174700/ 476837 | consumed samples: 44723200 | consumed tokens: 91593113600 | elapsed time per iteration (s): 0.68 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.596815E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.944 | TFLOPs: 22.80 | 31: iteration 174800/ 476837 | consumed samples: 44748800 | consumed tokens: 91645542400 | elapsed time per iteration (s): 0.68 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.589304E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.846 | TFLOPs: 22.80 | 31: iteration 174900/ 476837 | consumed samples: 44774400 | consumed tokens: 91697971200 | elapsed time per iteration (s): 0.68 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.589884E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.934 | TFLOPs: 22.68 | 31: iteration 175000/ 476837 | consumed samples: 44800000 | consumed tokens: 91750400000 | elapsed time per iteration (s): 0.68 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.587342E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.813 | TFLOPs: 22.80 | 31: iteration 175100/ 476837 | consumed samples: 44825600 | consumed tokens: 91802828800 | elapsed time per iteration (s): 0.68 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.597905E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.815 | TFLOPs: 22.80 | 31: iteration 175200/ 476837 | consumed samples: 44851200 | consumed tokens: 91855257600 | elapsed time per iteration (s): 0.70 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.591598E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.728 | TFLOPs: 22.25 | 31: iteration 175300/ 476837 | consumed samples: 44876800 | consumed tokens: 91907686400 | elapsed time per iteration (s): 0.68 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.592410E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.848 | TFLOPs: 22.80 | 31: iteration 175400/ 476837 | consumed samples: 44902400 | consumed tokens: 91960115200 | elapsed time per iteration (s): 0.68 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.591217E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.698 | TFLOPs: 22.67 | 31: iteration 175500/ 476837 | consumed samples: 44928000 | consumed tokens: 92012544000 | elapsed time per iteration (s): 0.76 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.591832E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 336.220 | TFLOPs: 20.34 | 31: iteration 175600/ 476837 | consumed samples: 44953600 | consumed tokens: 92064972800 | elapsed time per iteration (s): 0.68 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.590879E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.789 | TFLOPs: 22.79 | 31: iteration 175700/ 476837 | consumed samples: 44979200 | consumed tokens: 92117401600 | elapsed time per iteration (s): 0.68 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.591847E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.856 | TFLOPs: 22.80 | 31: iteration 175800/ 476837 | consumed samples: 45004800 | consumed tokens: 92169830400 | elapsed time per iteration (s): 0.68 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.590306E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.486 | TFLOPs: 22.66 | 31: iteration 175900/ 476837 | consumed samples: 45030400 | consumed tokens: 92222259200 | elapsed time per iteration (s): 0.68 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.593605E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.772 | TFLOPs: 22.79 | 0: [2023-04-27 06:51:31,688] [INFO] [logging.py:68:log_dist] [Rank 0] step=176000, skipped=0, lr=[0.00014762133364504298, 0.00014762133364504298, 0.00014762133364504298], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 176000/ 476837 | consumed samples: 45056000 | consumed tokens: 92274688000 | elapsed time per iteration (s): 0.68 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.594350E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.656 | TFLOPs: 22.73 | 0: steps: 176000 loss: 2.5599 iter time (s): 0.684 samples/sec: 374.177 31: iteration 176100/ 476837 | consumed samples: 45081600 | consumed tokens: 92327116800 | elapsed time per iteration (s): 0.69 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.591217E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.416 | TFLOPs: 22.53 | 31: iteration 176200/ 476837 | consumed samples: 45107200 | consumed tokens: 92379545600 | elapsed time per iteration (s): 0.68 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.591646E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.826 | TFLOPs: 22.80 | 31: iteration 176300/ 476837 | consumed samples: 45132800 | consumed tokens: 92431974400 | elapsed time per iteration (s): 0.68 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.592496E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.905 | TFLOPs: 22.80 | 31: iteration 176400/ 476837 | consumed samples: 45158400 | consumed tokens: 92484403200 | elapsed time per iteration (s): 0.71 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.594659E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.296 | TFLOPs: 21.92 | 31: iteration 176500/ 476837 | consumed samples: 45184000 | consumed tokens: 92536832000 | elapsed time per iteration (s): 0.68 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.591136E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.793 | TFLOPs: 22.79 | 31: iteration 176600/ 476837 | consumed samples: 45209600 | consumed tokens: 92589260800 | elapsed time per iteration (s): 0.69 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.585691E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.603 | TFLOPs: 22.42 | 31: iteration 176700/ 476837 | consumed samples: 45235200 | consumed tokens: 92641689600 | elapsed time per iteration (s): 0.68 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.598967E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.462 | TFLOPs: 22.78 | 31: iteration 176800/ 476837 | consumed samples: 45260800 | consumed tokens: 92694118400 | elapsed time per iteration (s): 0.68 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.591385E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.977 | TFLOPs: 22.81 | 31: iteration 176900/ 476837 | consumed samples: 45286400 | consumed tokens: 92746547200 | elapsed time per iteration (s): 0.68 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.591577E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.959 | TFLOPs: 22.81 | 31: iteration 177000/ 476837 | consumed samples: 45312000 | consumed tokens: 92798976000 | elapsed time per iteration (s): 0.68 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.590813E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.964 | TFLOPs: 22.81 | 31: iteration 177100/ 476837 | consumed samples: 45337600 | consumed tokens: 92851404800 | elapsed time per iteration (s): 0.70 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.590869E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.491 | TFLOPs: 22.11 | 31: iteration 177200/ 476837 | consumed samples: 45363200 | consumed tokens: 92903833600 | elapsed time per iteration (s): 0.68 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.592666E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.017 | TFLOPs: 22.81 | 31: iteration 177300/ 476837 | consumed samples: 45388800 | consumed tokens: 92956262400 | elapsed time per iteration (s): 0.68 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.589975E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.988 | TFLOPs: 22.81 | 31: iteration 177400/ 476837 | consumed samples: 45414400 | consumed tokens: 93008691200 | elapsed time per iteration (s): 0.68 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.592306E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.972 | TFLOPs: 22.81 | 31: iteration 177500/ 476837 | consumed samples: 45440000 | consumed tokens: 93061120000 | elapsed time per iteration (s): 0.69 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.594635E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.246 | TFLOPs: 22.40 | 31: iteration 177600/ 476837 | consumed samples: 45465600 | consumed tokens: 93113548800 | elapsed time per iteration (s): 0.68 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.584809E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.794 | TFLOPs: 22.80 | 31: iteration 177700/ 476837 | consumed samples: 45491200 | consumed tokens: 93165977600 | elapsed time per iteration (s): 0.68 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.588870E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.980 | TFLOPs: 22.81 | 31: iteration 177800/ 476837 | consumed samples: 45516800 | consumed tokens: 93218406400 | elapsed time per iteration (s): 0.68 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.589562E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.995 | TFLOPs: 22.81 | 31: iteration 177900/ 476837 | consumed samples: 45542400 | consumed tokens: 93270835200 | elapsed time per iteration (s): 0.68 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.588216E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.503 | TFLOPs: 22.78 | 0: [2023-04-27 07:14:19,078] [INFO] [logging.py:68:log_dist] [Rank 0] step=178000, skipped=0, lr=[0.0001465298216350523, 0.0001465298216350523, 0.0001465298216350523], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 178000/ 476837 | consumed samples: 45568000 | consumed tokens: 93323264000 | elapsed time per iteration (s): 0.69 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.592017E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.492 | TFLOPs: 22.53 | 0: steps: 178000 loss: 2.6472 iter time (s): 0.680 samples/sec: 376.304 31: iteration 178100/ 476837 | consumed samples: 45593600 | consumed tokens: 93375692800 | elapsed time per iteration (s): 0.68 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.589879E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.924 | TFLOPs: 22.80 | 31: iteration 178200/ 476837 | consumed samples: 45619200 | consumed tokens: 93428121600 | elapsed time per iteration (s): 0.68 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.589724E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.445 | TFLOPs: 22.77 | 31: iteration 178300/ 476837 | consumed samples: 45644800 | consumed tokens: 93480550400 | elapsed time per iteration (s): 0.68 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.586019E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.964 | TFLOPs: 22.81 | 31: iteration 178400/ 476837 | consumed samples: 45670400 | consumed tokens: 93532979200 | elapsed time per iteration (s): 0.68 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.587551E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.915 | TFLOPs: 22.80 | 31: iteration 178500/ 476837 | consumed samples: 45696000 | consumed tokens: 93585408000 | elapsed time per iteration (s): 0.68 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.589024E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.991 | TFLOPs: 22.81 | 31: iteration 178600/ 476837 | consumed samples: 45721600 | consumed tokens: 93637836800 | elapsed time per iteration (s): 0.68 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.591867E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.423 | TFLOPs: 22.65 | 31: iteration 178700/ 476837 | consumed samples: 45747200 | consumed tokens: 93690265600 | elapsed time per iteration (s): 0.68 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.592582E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.158 | TFLOPs: 22.76 | 31: iteration 178800/ 476837 | consumed samples: 45772800 | consumed tokens: 93742694400 | elapsed time per iteration (s): 0.70 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.589420E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.140 | TFLOPs: 22.15 | 31: iteration 178900/ 476837 | consumed samples: 45798400 | consumed tokens: 93795123200 | elapsed time per iteration (s): 0.68 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.591828E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.047 | TFLOPs: 22.81 | 31: iteration 179000/ 476837 | consumed samples: 45824000 | consumed tokens: 93847552000 | elapsed time per iteration (s): 0.68 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.594433E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.036 | TFLOPs: 22.81 | 31: iteration 179100/ 476837 | consumed samples: 45849600 | consumed tokens: 93899980800 | elapsed time per iteration (s): 0.68 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.589924E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.248 | TFLOPs: 22.76 | 31: iteration 179200/ 476837 | consumed samples: 45875200 | consumed tokens: 93952409600 | elapsed time per iteration (s): 0.68 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.591547E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.579 | TFLOPs: 22.78 | 31: iteration 179300/ 476837 | consumed samples: 45900800 | consumed tokens: 94004838400 | elapsed time per iteration (s): 0.68 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.587155E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.906 | TFLOPs: 22.80 | 31: iteration 179400/ 476837 | consumed samples: 45926400 | consumed tokens: 94057267200 | elapsed time per iteration (s): 0.68 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.587375E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.943 | TFLOPs: 22.80 | 31: iteration 179500/ 476837 | consumed samples: 45952000 | consumed tokens: 94109696000 | elapsed time per iteration (s): 0.68 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.589794E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.914 | TFLOPs: 22.80 | 31: iteration 179600/ 476837 | consumed samples: 45977600 | consumed tokens: 94162124800 | elapsed time per iteration (s): 0.68 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.587589E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.152 | TFLOPs: 22.70 | 31: iteration 179700/ 476837 | consumed samples: 46003200 | consumed tokens: 94214553600 | elapsed time per iteration (s): 0.68 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.587743E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.680 | TFLOPs: 22.79 | 31: iteration 179800/ 476837 | consumed samples: 46028800 | consumed tokens: 94266982400 | elapsed time per iteration (s): 0.68 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.587758E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.874 | TFLOPs: 22.80 | 31: iteration 179900/ 476837 | consumed samples: 46054400 | consumed tokens: 94319411200 | elapsed time per iteration (s): 0.73 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.587807E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.842 | TFLOPs: 21.35 | 0: [2023-04-27 07:37:05,621] [INFO] [logging.py:68:log_dist] [Rank 0] step=180000, skipped=0, lr=[0.00014543183834342453, 0.00014543183834342453, 0.00014543183834342453], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 180000/ 476837 | consumed samples: 46080000 | consumed tokens: 94371840000 | elapsed time per iteration (s): 0.68 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.589841E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.912 | TFLOPs: 22.68 | 0: steps: 180000 loss: 2.6236 iter time (s): 0.680 samples/sec: 376.618 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 180000 | lm loss value: 2.949950E+00 | lm loss PPL: 1.910499E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 180000 to checkpoints_1b1250b1b5 0: [2023-04-27 07:37:05,982] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step180000 is begin to save! 0: [2023-04-27 07:37:05,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_01-model_00-model_states.pt... 0: [2023-04-27 07:37:06,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_01-model_00-model_states.pt. 0: [2023-04-27 07:37:06,300] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_03-model_00-model_states.pt... 0: [2023-04-27 07:37:06,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_03-model_00-model_states.pt. 0: [2023-04-27 07:37:06,394] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_04-model_00-model_states.pt... 0: [2023-04-27 07:37:06,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_04-model_00-model_states.pt. 0: [2023-04-27 07:37:06,472] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_05-model_00-model_states.pt... 0: [2023-04-27 07:37:06,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_05-model_00-model_states.pt. 0: [2023-04-27 07:37:06,565] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_06-model_00-model_states.pt... 0: [2023-04-27 07:37:06,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_06-model_00-model_states.pt. 0: [2023-04-27 07:37:06,640] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_07-model_00-model_states.pt... 0: [2023-04-27 07:37:06,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_07-model_00-model_states.pt. 0: [2023-04-27 07:37:06,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_08-model_00-model_states.pt... 0: [2023-04-27 07:37:06,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_08-model_00-model_states.pt. 0: [2023-04-27 07:37:06,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_09-model_00-model_states.pt... 0: [2023-04-27 07:37:06,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_09-model_00-model_states.pt. 0: [2023-04-27 07:37:06,890] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_10-model_00-model_states.pt... 0: [2023-04-27 07:37:06,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_10-model_00-model_states.pt. 0: [2023-04-27 07:37:06,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_11-model_00-model_states.pt... 0: [2023-04-27 07:37:07,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_11-model_00-model_states.pt. 0: [2023-04-27 07:37:07,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_12-model_00-model_states.pt... 0: [2023-04-27 07:37:07,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_12-model_00-model_states.pt. 0: [2023-04-27 07:37:07,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_13-model_00-model_states.pt... 0: [2023-04-27 07:37:07,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_13-model_00-model_states.pt. 0: [2023-04-27 07:37:07,232] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_14-model_00-model_states.pt... 0: [2023-04-27 07:37:07,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_14-model_00-model_states.pt. 0: [2023-04-27 07:37:07,307] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_15-model_00-model_states.pt... 0: [2023-04-27 07:37:07,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_15-model_00-model_states.pt. 0: [2023-04-27 07:37:07,395] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_16-model_00-model_states.pt... 0: [2023-04-27 07:37:07,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_16-model_00-model_states.pt. 0: [2023-04-27 07:37:07,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_17-model_00-model_states.pt... 0: [2023-04-27 07:37:07,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_17-model_00-model_states.pt. 0: [2023-04-27 07:37:07,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_18-model_00-model_states.pt... 0: [2023-04-27 07:37:07,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_18-model_00-model_states.pt. 0: [2023-04-27 07:37:07,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_19-model_00-model_states.pt... 0: [2023-04-27 07:37:07,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_19-model_00-model_states.pt. 0: [2023-04-27 07:37:07,747] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_20-model_00-model_states.pt... 0: [2023-04-27 07:37:07,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_20-model_00-model_states.pt. 0: [2023-04-27 07:37:07,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_21-model_00-model_states.pt... 0: [2023-04-27 07:37:07,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_21-model_00-model_states.pt. 0: [2023-04-27 07:37:07,922] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_22-model_00-model_states.pt... 0: [2023-04-27 07:37:07,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_22-model_00-model_states.pt. 0: [2023-04-27 07:37:07,996] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_23-model_00-model_states.pt... 0: [2023-04-27 07:37:08,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_23-model_00-model_states.pt. 0: [2023-04-27 07:37:08,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_24-model_00-model_states.pt... 0: [2023-04-27 07:37:08,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_24-model_00-model_states.pt. 0: [2023-04-27 07:37:08,170] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_25-model_00-model_states.pt... 0: [2023-04-27 07:37:08,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_25-model_00-model_states.pt. 0: [2023-04-27 07:37:08,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_26-model_00-model_states.pt... 0: [2023-04-27 07:37:08,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_26-model_00-model_states.pt. 0: [2023-04-27 07:37:08,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_27-model_00-model_states.pt... 0: [2023-04-27 07:37:08,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_27-model_00-model_states.pt. 0: [2023-04-27 07:37:08,403] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_28-model_00-model_states.pt... 0: [2023-04-27 07:37:08,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_28-model_00-model_states.pt. 0: [2023-04-27 07:37:08,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/layer_30-model_00-model_states.pt... 0: [2023-04-27 07:37:08,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/layer_30-model_00-model_states.pt. 0: [2023-04-27 07:37:08,495] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step180000/mp_rank_00_model_states.pt 0: [2023-04-27 07:37:08,495] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/mp_rank_00_model_states.pt... 0: [2023-04-27 07:37:08,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/mp_rank_00_model_states.pt. 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 10: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 23: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 3: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 13: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 18: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 25: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 30: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 8: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 11: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 9: [2023-04-27 07:37:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 15: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 12: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 24: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 27: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 28: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 26: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-04-27 07:37:08,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-04-27 07:37:08,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,652] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,652] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-27 07:37:08,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-27 07:37:08,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 21: [2023-04-27 07:37:08,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 16: [2023-04-27 07:37:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 19: [2023-04-27 07:37:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:08,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-04-27 07:37:08,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:08,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:08,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:08,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:08,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:08,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-27 07:37:08,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-27 07:37:08,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 19: [2023-04-27 07:37:08,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-27 07:37:08,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 16: [2023-04-27 07:37:08,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-27 07:37:08,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 21: [2023-04-27 07:37:08,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-27 07:37:08,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-27 07:37:08,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 19: [2023-04-27 07:37:08,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 21: [2023-04-27 07:37:08,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 19: [2023-04-27 07:37:08,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 16: [2023-04-27 07:37:08,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-27 07:37:08,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 07:37:08,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 16: [2023-04-27 07:37:08,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 19: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 19: [2023-04-27 07:37:08,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-27 07:37:08,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-27 07:37:08,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 26: [2023-04-27 07:37:08,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-27 07:37:08,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-27 07:37:08,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 11: [2023-04-27 07:37:08,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 07:37:08,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-27 07:37:08,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-27 07:37:08,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-04-27 07:37:08,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 16: [2023-04-27 07:37:08,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 1: [2023-04-27 07:37:08,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:08,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,755] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 21: [2023-04-27 07:37:08,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 21: [2023-04-27 07:37:08,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 21: [2023-04-27 07:37:08,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 29: [2023-04-27 07:37:08,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 07:37:08,758] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-27 07:37:08,758] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 16: [2023-04-27 07:37:08,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 07:37:08,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-27 07:37:08,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-27 07:37:08,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-27 07:37:08,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 07:37:08,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 18: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 17: [2023-04-27 07:37:08,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-27 07:37:08,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-27 07:37:08,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 07:37:08,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-27 07:37:08,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 23: [2023-04-27 07:37:08,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 25: [2023-04-27 07:37:08,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 7: [2023-04-27 07:37:08,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-27 07:37:08,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-27 07:37:08,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-27 07:37:08,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 10: [2023-04-27 07:37:08,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 10: [2023-04-27 07:37:08,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 5: [2023-04-27 07:37:08,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-27 07:37:08,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 22: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 18: [2023-04-27 07:37:08,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 07:37:08,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-27 07:37:08,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 4: [2023-04-27 07:37:08,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 07:37:08,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-27 07:37:08,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 6: [2023-04-27 07:37:08,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-27 07:37:08,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-27 07:37:08,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 9: [2023-04-27 07:37:08,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 24: [2023-04-27 07:37:08,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:08,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:08,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 14: [2023-04-27 07:37:09,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 07:37:09,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-27 07:37:09,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 13: [2023-04-27 07:37:09,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 28: [2023-04-27 07:37:09,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 07:37:09,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-27 07:37:09,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,076] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-27 07:37:09,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 15: [2023-04-27 07:37:09,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,081] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,081] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:09,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,084] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,084] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: [2023-04-27 07:37:09,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-27 07:37:09,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 3: [2023-04-27 07:37:09,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 07:37:09,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-27 07:37:09,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 31: [2023-04-27 07:37:09,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 12: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 20: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 27: [2023-04-27 07:37:09,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 8: [2023-04-27 07:37:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-27 07:37:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-27 07:37:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 30: [2023-04-27 07:37:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-27 07:37:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-27 07:37:10,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 2: [2023-04-27 07:37:10,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 07:37:10,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step180000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-27 07:37:10,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step180000 is ready now! 0: successfully saved checkpoint at iteration 180000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 4471.74 31: iteration 180100/ 476837 | consumed samples: 46105600 | consumed tokens: 94424268800 | elapsed time per iteration (s): 0.73 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.587722E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.757 | TFLOPs: 21.22 | 31: iteration 180200/ 476837 | consumed samples: 46131200 | consumed tokens: 94476697600 | elapsed time per iteration (s): 0.69 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.591556E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.474 | TFLOPs: 22.35 | 31: iteration 180300/ 476837 | consumed samples: 46156800 | consumed tokens: 94529126400 | elapsed time per iteration (s): 0.68 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.588317E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.965 | TFLOPs: 22.81 | 31: iteration 180400/ 476837 | consumed samples: 46182400 | consumed tokens: 94581555200 | elapsed time per iteration (s): 0.70 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.589740E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.577 | TFLOPs: 22.06 | 31: iteration 180500/ 476837 | consumed samples: 46208000 | consumed tokens: 94633984000 | elapsed time per iteration (s): 0.68 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.595389E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.865 | TFLOPs: 22.74 | 31: iteration 180600/ 476837 | consumed samples: 46233600 | consumed tokens: 94686412800 | elapsed time per iteration (s): 0.68 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.586833E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.516 | TFLOPs: 22.72 | 31: iteration 180700/ 476837 | consumed samples: 46259200 | consumed tokens: 94738841600 | elapsed time per iteration (s): 0.68 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.589380E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.910 | TFLOPs: 22.80 | 31: iteration 180800/ 476837 | consumed samples: 46284800 | consumed tokens: 94791270400 | elapsed time per iteration (s): 0.68 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.586642E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.953 | TFLOPs: 22.80 | 31: iteration 180900/ 476837 | consumed samples: 46310400 | consumed tokens: 94843699200 | elapsed time per iteration (s): 0.68 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.589963E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.815 | TFLOPs: 22.80 | 31: iteration 181000/ 476837 | consumed samples: 46336000 | consumed tokens: 94896128000 | elapsed time per iteration (s): 0.68 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.588743E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.868 | TFLOPs: 22.80 | 31: iteration 181100/ 476837 | consumed samples: 46361600 | consumed tokens: 94948556800 | elapsed time per iteration (s): 0.68 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.585355E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.899 | TFLOPs: 22.80 | 31: iteration 181200/ 476837 | consumed samples: 46387200 | consumed tokens: 95000985600 | elapsed time per iteration (s): 0.68 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.587421E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.950 | TFLOPs: 22.74 | 31: iteration 181300/ 476837 | consumed samples: 46412800 | consumed tokens: 95053414400 | elapsed time per iteration (s): 0.68 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.592944E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.854 | TFLOPs: 22.80 | 31: iteration 181400/ 476837 | consumed samples: 46438400 | consumed tokens: 95105843200 | elapsed time per iteration (s): 0.68 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.586879E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.905 | TFLOPs: 22.80 | 31: iteration 181500/ 476837 | consumed samples: 46464000 | consumed tokens: 95158272000 | elapsed time per iteration (s): 0.68 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.584168E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.437 | TFLOPs: 22.71 | 31: iteration 181600/ 476837 | consumed samples: 46489600 | consumed tokens: 95210700800 | elapsed time per iteration (s): 0.68 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.588671E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.807 | TFLOPs: 22.80 | 31: iteration 181700/ 476837 | consumed samples: 46515200 | consumed tokens: 95263129600 | elapsed time per iteration (s): 0.68 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.589204E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.068 | TFLOPs: 22.69 | 31: iteration 181800/ 476837 | consumed samples: 46540800 | consumed tokens: 95315558400 | elapsed time per iteration (s): 0.68 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.599745E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.808 | TFLOPs: 22.80 | 31: iteration 181900/ 476837 | consumed samples: 46566400 | consumed tokens: 95367987200 | elapsed time per iteration (s): 0.68 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.587596E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.148 | TFLOPs: 22.82 | 0: [2023-04-27 07:59:53,977] [INFO] [logging.py:68:log_dist] [Rank 0] step=182000, skipped=0, lr=[0.00014432757827861315, 0.00014432757827861315, 0.00014432757827861315], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 182000/ 476837 | consumed samples: 46592000 | consumed tokens: 95420416000 | elapsed time per iteration (s): 0.68 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.589776E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.001 | TFLOPs: 22.81 | 0: steps: 182000 loss: 2.5790 iter time (s): 0.679 samples/sec: 376.792 31: iteration 182100/ 476837 | consumed samples: 46617600 | consumed tokens: 95472844800 | elapsed time per iteration (s): 0.73 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.583675E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.723 | TFLOPs: 21.34 | 31: iteration 182200/ 476837 | consumed samples: 46643200 | consumed tokens: 95525273600 | elapsed time per iteration (s): 0.68 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.588134E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.108 | TFLOPs: 22.81 | 31: iteration 182300/ 476837 | consumed samples: 46668800 | consumed tokens: 95577702400 | elapsed time per iteration (s): 0.68 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.587796E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.770 | TFLOPs: 22.79 | 31: iteration 182400/ 476837 | consumed samples: 46694400 | consumed tokens: 95630131200 | elapsed time per iteration (s): 0.68 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.585259E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.967 | TFLOPs: 22.81 | 31: iteration 182500/ 476837 | consumed samples: 46720000 | consumed tokens: 95682560000 | elapsed time per iteration (s): 0.73 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.590505E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.190 | TFLOPs: 21.31 | 31: iteration 182600/ 476837 | consumed samples: 46745600 | consumed tokens: 95734988800 | elapsed time per iteration (s): 0.68 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.587882E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.093 | TFLOPs: 22.81 | 31: iteration 182700/ 476837 | consumed samples: 46771200 | consumed tokens: 95787417600 | elapsed time per iteration (s): 0.68 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.587419E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.025 | TFLOPs: 22.81 | 31: iteration 182800/ 476837 | consumed samples: 46796800 | consumed tokens: 95839846400 | elapsed time per iteration (s): 0.68 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.587130E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.282 | TFLOPs: 22.70 | 31: iteration 182900/ 476837 | consumed samples: 46822400 | consumed tokens: 95892275200 | elapsed time per iteration (s): 0.70 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.586239E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.166 | TFLOPs: 22.09 | 31: iteration 183000/ 476837 | consumed samples: 46848000 | consumed tokens: 95944704000 | elapsed time per iteration (s): 0.74 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.587240E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.455 | TFLOPs: 20.96 | 31: iteration 183100/ 476837 | consumed samples: 46873600 | consumed tokens: 95997132800 | elapsed time per iteration (s): 0.69 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.582433E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.127 | TFLOPs: 22.45 | 31: iteration 183200/ 476837 | consumed samples: 46899200 | consumed tokens: 96049561600 | elapsed time per iteration (s): 0.68 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.592770E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.043 | TFLOPs: 22.81 | 31: iteration 183300/ 476837 | consumed samples: 46924800 | consumed tokens: 96101990400 | elapsed time per iteration (s): 0.68 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.588072E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.943 | TFLOPs: 22.80 | 31: iteration 183400/ 476837 | consumed samples: 46950400 | consumed tokens: 96154419200 | elapsed time per iteration (s): 0.68 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.587690E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.982 | TFLOPs: 22.81 | 31: iteration 183500/ 476837 | consumed samples: 46976000 | consumed tokens: 96206848000 | elapsed time per iteration (s): 0.74 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.585278E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 347.707 | TFLOPs: 21.04 | 31: iteration 183600/ 476837 | consumed samples: 47001600 | consumed tokens: 96259276800 | elapsed time per iteration (s): 0.68 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.586454E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.680 | TFLOPs: 22.79 | 31: iteration 183700/ 476837 | consumed samples: 47027200 | consumed tokens: 96311705600 | elapsed time per iteration (s): 0.68 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.584274E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.314 | TFLOPs: 22.77 | 31: iteration 183800/ 476837 | consumed samples: 47052800 | consumed tokens: 96364134400 | elapsed time per iteration (s): 0.70 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.583851E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.326 | TFLOPs: 22.04 | 31: iteration 183900/ 476837 | consumed samples: 47078400 | consumed tokens: 96416563200 | elapsed time per iteration (s): 0.68 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.581689E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.041 | TFLOPs: 22.81 | 0: [2023-04-27 08:22:59,349] [INFO] [logging.py:68:log_dist] [Rank 0] step=184000, skipped=0, lr=[0.00014321723706100611, 0.00014321723706100611, 0.00014321723706100611], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 184000/ 476837 | consumed samples: 47104000 | consumed tokens: 96468992000 | elapsed time per iteration (s): 0.68 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.583781E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.098 | TFLOPs: 22.81 | 0: steps: 184000 loss: 2.5501 iter time (s): 0.690 samples/sec: 370.872 31: iteration 184100/ 476837 | consumed samples: 47129600 | consumed tokens: 96521420800 | elapsed time per iteration (s): 0.68 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.580668E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.076 | TFLOPs: 22.81 | 31: iteration 184200/ 476837 | consumed samples: 47155200 | consumed tokens: 96573849600 | elapsed time per iteration (s): 0.68 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.586213E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.005 | TFLOPs: 22.81 | 31: iteration 184300/ 476837 | consumed samples: 47180800 | consumed tokens: 96626278400 | elapsed time per iteration (s): 0.70 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.581248E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.937 | TFLOPs: 22.08 | 31: iteration 184400/ 476837 | consumed samples: 47206400 | consumed tokens: 96678707200 | elapsed time per iteration (s): 0.68 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.582202E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.312 | TFLOPs: 22.77 | 31: iteration 184500/ 476837 | consumed samples: 47232000 | consumed tokens: 96731136000 | elapsed time per iteration (s): 0.68 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.584578E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.200 | TFLOPs: 22.76 | 31: iteration 184600/ 476837 | consumed samples: 47257600 | consumed tokens: 96783564800 | elapsed time per iteration (s): 0.68 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.583446E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.048 | TFLOPs: 22.81 | 31: iteration 184700/ 476837 | consumed samples: 47283200 | consumed tokens: 96835993600 | elapsed time per iteration (s): 0.68 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.582642E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.061 | TFLOPs: 22.81 | 31: iteration 184800/ 476837 | consumed samples: 47308800 | consumed tokens: 96888422400 | elapsed time per iteration (s): 0.68 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.579363E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.043 | TFLOPs: 22.81 | 31: iteration 184900/ 476837 | consumed samples: 47334400 | consumed tokens: 96940851200 | elapsed time per iteration (s): 0.68 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.581476E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.979 | TFLOPs: 22.81 | 31: iteration 185000/ 476837 | consumed samples: 47360000 | consumed tokens: 96993280000 | elapsed time per iteration (s): 0.68 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.585748E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.162 | TFLOPs: 22.82 | 31: iteration 185100/ 476837 | consumed samples: 47385600 | consumed tokens: 97045708800 | elapsed time per iteration (s): 0.68 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.588418E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.176 | TFLOPs: 22.82 | 31: iteration 185200/ 476837 | consumed samples: 47411200 | consumed tokens: 97098137600 | elapsed time per iteration (s): 0.68 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.585218E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.924 | TFLOPs: 22.80 | 31: iteration 185300/ 476837 | consumed samples: 47436800 | consumed tokens: 97150566400 | elapsed time per iteration (s): 0.68 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.583877E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.433 | TFLOPs: 22.77 | 31: iteration 185400/ 476837 | consumed samples: 47462400 | consumed tokens: 97202995200 | elapsed time per iteration (s): 0.68 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.584148E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.935 | TFLOPs: 22.80 | 31: iteration 185500/ 476837 | consumed samples: 47488000 | consumed tokens: 97255424000 | elapsed time per iteration (s): 0.70 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.578406E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 363.420 | TFLOPs: 21.99 | 31: iteration 185600/ 476837 | consumed samples: 47513600 | consumed tokens: 97307852800 | elapsed time per iteration (s): 0.69 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.588446E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.258 | TFLOPs: 22.40 | 31: iteration 185700/ 476837 | consumed samples: 47539200 | consumed tokens: 97360281600 | elapsed time per iteration (s): 0.68 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.582854E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.052 | TFLOPs: 22.63 | 31: iteration 185800/ 476837 | consumed samples: 47564800 | consumed tokens: 97412710400 | elapsed time per iteration (s): 0.69 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.584445E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.149 | TFLOPs: 22.45 | 31: iteration 185900/ 476837 | consumed samples: 47590400 | consumed tokens: 97465139200 | elapsed time per iteration (s): 0.74 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.575226E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.156 | TFLOPs: 20.94 | 0: [2023-04-27 08:45:51,416] [INFO] [logging.py:68:log_dist] [Rank 0] step=186000, skipped=0, lr=[0.00014210101138827157, 0.00014210101138827157, 0.00014210101138827157], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 186000/ 476837 | consumed samples: 47616000 | consumed tokens: 97517568000 | elapsed time per iteration (s): 0.68 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.582540E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.979 | TFLOPs: 22.81 | 0: steps: 186000 loss: 2.5967 iter time (s): 0.684 samples/sec: 374.534 31: iteration 186100/ 476837 | consumed samples: 47641600 | consumed tokens: 97569996800 | elapsed time per iteration (s): 0.68 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.582807E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.007 | TFLOPs: 22.75 | 31: iteration 186200/ 476837 | consumed samples: 47667200 | consumed tokens: 97622425600 | elapsed time per iteration (s): 0.68 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.586612E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.386 | TFLOPs: 22.77 | 31: iteration 186300/ 476837 | consumed samples: 47692800 | consumed tokens: 97674854400 | elapsed time per iteration (s): 0.68 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.583824E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.366 | TFLOPs: 22.71 | 31: iteration 186400/ 476837 | consumed samples: 47718400 | consumed tokens: 97727283200 | elapsed time per iteration (s): 0.68 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.581360E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.300 | TFLOPs: 22.64 | 31: iteration 186500/ 476837 | consumed samples: 47744000 | consumed tokens: 97779712000 | elapsed time per iteration (s): 0.68 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.580162E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.081 | TFLOPs: 22.81 | 31: iteration 186600/ 476837 | consumed samples: 47769600 | consumed tokens: 97832140800 | elapsed time per iteration (s): 0.68 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.585908E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.111 | TFLOPs: 22.81 | 31: iteration 186700/ 476837 | consumed samples: 47795200 | consumed tokens: 97884569600 | elapsed time per iteration (s): 0.68 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.583459E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.259 | TFLOPs: 22.64 | 31: iteration 186800/ 476837 | consumed samples: 47820800 | consumed tokens: 97936998400 | elapsed time per iteration (s): 0.68 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.579741E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.286 | TFLOPs: 22.76 | 31: iteration 186900/ 476837 | consumed samples: 47846400 | consumed tokens: 97989427200 | elapsed time per iteration (s): 0.68 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.581387E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.176 | TFLOPs: 22.64 | 31: iteration 187000/ 476837 | consumed samples: 47872000 | consumed tokens: 98041856000 | elapsed time per iteration (s): 0.68 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.580820E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.269 | TFLOPs: 22.64 | 31: iteration 187100/ 476837 | consumed samples: 47897600 | consumed tokens: 98094284800 | elapsed time per iteration (s): 0.68 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.583132E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.955 | TFLOPs: 22.80 | 31: iteration 187200/ 476837 | consumed samples: 47923200 | consumed tokens: 98146713600 | elapsed time per iteration (s): 0.68 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.583251E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.104 | TFLOPs: 22.63 | 31: iteration 187300/ 476837 | consumed samples: 47948800 | consumed tokens: 98199142400 | elapsed time per iteration (s): 0.70 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.581058E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.099 | TFLOPs: 22.09 | 31: iteration 187400/ 476837 | consumed samples: 47974400 | consumed tokens: 98251571200 | elapsed time per iteration (s): 0.68 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.581592E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.337 | TFLOPs: 22.77 | 31: iteration 187500/ 476837 | consumed samples: 48000000 | consumed tokens: 98304000000 | elapsed time per iteration (s): 0.70 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.584971E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.022 | TFLOPs: 22.08 | 31: iteration 187600/ 476837 | consumed samples: 48025600 | consumed tokens: 98356428800 | elapsed time per iteration (s): 0.68 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.582177E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.974 | TFLOPs: 22.81 | 31: iteration 187700/ 476837 | consumed samples: 48051200 | consumed tokens: 98408857600 | elapsed time per iteration (s): 0.68 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.579403E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.027 | TFLOPs: 22.81 | 31: iteration 187800/ 476837 | consumed samples: 48076800 | consumed tokens: 98461286400 | elapsed time per iteration (s): 0.69 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.581672E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.136 | TFLOPs: 22.57 | 31: iteration 187900/ 476837 | consumed samples: 48102400 | consumed tokens: 98513715200 | elapsed time per iteration (s): 0.68 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.578364E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.994 | TFLOPs: 22.81 | 0: [2023-04-27 09:08:44,546] [INFO] [logging.py:68:log_dist] [Rank 0] step=188000, skipped=0, lr=[0.00014097909900051263, 0.00014097909900051263, 0.00014097909900051263], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 188000/ 476837 | consumed samples: 48128000 | consumed tokens: 98566144000 | elapsed time per iteration (s): 0.74 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.581166E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.766 | TFLOPs: 20.80 | 0: steps: 188000 loss: 2.5507 iter time (s): 0.683 samples/sec: 374.647 31: iteration 188100/ 476837 | consumed samples: 48153600 | consumed tokens: 98618572800 | elapsed time per iteration (s): 0.73 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.579753E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.814 | TFLOPs: 21.34 | 31: iteration 188200/ 476837 | consumed samples: 48179200 | consumed tokens: 98671001600 | elapsed time per iteration (s): 0.68 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.583761E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.804 | TFLOPs: 22.80 | 31: iteration 188300/ 476837 | consumed samples: 48204800 | consumed tokens: 98723430400 | elapsed time per iteration (s): 0.68 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.580469E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.873 | TFLOPs: 22.80 | 31: iteration 188400/ 476837 | consumed samples: 48230400 | consumed tokens: 98775859200 | elapsed time per iteration (s): 0.68 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.581775E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.920 | TFLOPs: 22.80 | 31: iteration 188500/ 476837 | consumed samples: 48256000 | consumed tokens: 98828288000 | elapsed time per iteration (s): 0.69 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.579186E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.458 | TFLOPs: 22.41 | 31: iteration 188600/ 476837 | consumed samples: 48281600 | consumed tokens: 98880716800 | elapsed time per iteration (s): 0.68 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.583589E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.906 | TFLOPs: 22.80 | 31: iteration 188700/ 476837 | consumed samples: 48307200 | consumed tokens: 98933145600 | elapsed time per iteration (s): 0.68 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.581701E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.965 | TFLOPs: 22.81 | 31: iteration 188800/ 476837 | consumed samples: 48332800 | consumed tokens: 98985574400 | elapsed time per iteration (s): 0.68 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.581305E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.952 | TFLOPs: 22.80 | 31: iteration 188900/ 476837 | consumed samples: 48358400 | consumed tokens: 99038003200 | elapsed time per iteration (s): 0.68 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.578489E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.997 | TFLOPs: 22.81 | 31: iteration 189000/ 476837 | consumed samples: 48384000 | consumed tokens: 99090432000 | elapsed time per iteration (s): 0.71 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.581170E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.588 | TFLOPs: 21.88 | 31: iteration 189100/ 476837 | consumed samples: 48409600 | consumed tokens: 99142860800 | elapsed time per iteration (s): 0.73 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.580604E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.132 | TFLOPs: 21.30 | 31: iteration 189200/ 476837 | consumed samples: 48435200 | consumed tokens: 99195289600 | elapsed time per iteration (s): 0.68 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.580979E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.227 | TFLOPs: 22.76 | 31: iteration 189300/ 476837 | consumed samples: 48460800 | consumed tokens: 99247718400 | elapsed time per iteration (s): 0.68 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.584276E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.009 | TFLOPs: 22.75 | 31: iteration 189400/ 476837 | consumed samples: 48486400 | consumed tokens: 99300147200 | elapsed time per iteration (s): 0.68 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.584295E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.920 | TFLOPs: 22.80 | 31: iteration 189500/ 476837 | consumed samples: 48512000 | consumed tokens: 99352576000 | elapsed time per iteration (s): 0.82 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.575902E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 313.790 | TFLOPs: 18.98 | 31: iteration 189600/ 476837 | consumed samples: 48537600 | consumed tokens: 99405004800 | elapsed time per iteration (s): 0.70 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.578242E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.449 | TFLOPs: 22.17 | 31: iteration 189700/ 476837 | consumed samples: 48563200 | consumed tokens: 99457433600 | elapsed time per iteration (s): 0.68 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.580304E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.015 | TFLOPs: 22.81 | 31: iteration 189800/ 476837 | consumed samples: 48588800 | consumed tokens: 99509862400 | elapsed time per iteration (s): 0.68 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.578829E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.003 | TFLOPs: 22.81 | 31: iteration 189900/ 476837 | consumed samples: 48614400 | consumed tokens: 99562291200 | elapsed time per iteration (s): 0.69 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.584593E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.685 | TFLOPs: 22.61 | 0: [2023-04-27 09:31:55,736] [INFO] [logging.py:68:log_dist] [Rank 0] step=190000, skipped=0, lr=[0.00013985169864523774, 0.00013985169864523774, 0.00013985169864523774], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 190000/ 476837 | consumed samples: 48640000 | consumed tokens: 99614720000 | elapsed time per iteration (s): 0.71 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.574040E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.635 | TFLOPs: 21.88 | 0: steps: 190000 loss: 2.5753 iter time (s): 0.692 samples/sec: 369.814 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 190000 | lm loss value: 2.879343E+00 | lm loss PPL: 1.780257E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 190100/ 476837 | consumed samples: 48665600 | consumed tokens: 99667148800 | elapsed time per iteration (s): 0.71 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.577420E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.997 | TFLOPs: 21.90 | 31: iteration 190200/ 476837 | consumed samples: 48691200 | consumed tokens: 99719577600 | elapsed time per iteration (s): 0.73 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.580539E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.644 | TFLOPs: 21.33 | 31: iteration 190300/ 476837 | consumed samples: 48716800 | consumed tokens: 99772006400 | elapsed time per iteration (s): 0.68 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.576318E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.075 | TFLOPs: 22.81 | 31: iteration 190400/ 476837 | consumed samples: 48742400 | consumed tokens: 99824435200 | elapsed time per iteration (s): 0.68 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.577187E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.087 | TFLOPs: 22.81 | 31: iteration 190500/ 476837 | consumed samples: 48768000 | consumed tokens: 99876864000 | elapsed time per iteration (s): 0.68 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.579009E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.935 | TFLOPs: 22.80 | 31: iteration 190600/ 476837 | consumed samples: 48793600 | consumed tokens: 99929292800 | elapsed time per iteration (s): 0.68 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.577569E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.501 | TFLOPs: 22.78 | 31: iteration 190700/ 476837 | consumed samples: 48819200 | consumed tokens: 99981721600 | elapsed time per iteration (s): 0.68 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.582594E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.978 | TFLOPs: 22.81 | 31: iteration 190800/ 476837 | consumed samples: 48844800 | consumed tokens: 100034150400 | elapsed time per iteration (s): 0.71 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.573290E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.873 | TFLOPs: 21.83 | 31: iteration 190900/ 476837 | consumed samples: 48870400 | consumed tokens: 100086579200 | elapsed time per iteration (s): 0.68 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.578535E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.299 | TFLOPs: 22.77 | 31: iteration 191000/ 476837 | consumed samples: 48896000 | consumed tokens: 100139008000 | elapsed time per iteration (s): 0.68 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.581277E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.106 | TFLOPs: 22.81 | 31: iteration 191100/ 476837 | consumed samples: 48921600 | consumed tokens: 100191436800 | elapsed time per iteration (s): 0.68 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.582991E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.129 | TFLOPs: 22.82 | 31: iteration 191200/ 476837 | consumed samples: 48947200 | consumed tokens: 100243865600 | elapsed time per iteration (s): 0.69 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.579042E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.614 | TFLOPs: 22.48 | 31: iteration 191300/ 476837 | consumed samples: 48972800 | consumed tokens: 100296294400 | elapsed time per iteration (s): 0.69 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.581756E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.964 | TFLOPs: 22.38 | 31: iteration 191400/ 476837 | consumed samples: 48998400 | consumed tokens: 100348723200 | elapsed time per iteration (s): 0.70 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.582368E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.642 | TFLOPs: 22.24 | 31: iteration 191500/ 476837 | consumed samples: 49024000 | consumed tokens: 100401152000 | elapsed time per iteration (s): 0.68 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.578542E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.475 | TFLOPs: 22.65 | 31: iteration 191600/ 476837 | consumed samples: 49049600 | consumed tokens: 100453580800 | elapsed time per iteration (s): 0.68 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.577660E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.728 | TFLOPs: 22.79 | 31: iteration 191700/ 476837 | consumed samples: 49075200 | consumed tokens: 100506009600 | elapsed time per iteration (s): 0.68 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.580701E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.294 | TFLOPs: 22.83 | 31: iteration 191800/ 476837 | consumed samples: 49100800 | consumed tokens: 100558438400 | elapsed time per iteration (s): 0.68 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.576891E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.120 | TFLOPs: 22.75 | 31: iteration 191900/ 476837 | consumed samples: 49126400 | consumed tokens: 100610867200 | elapsed time per iteration (s): 0.68 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.577208E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.647 | TFLOPs: 22.73 | 0: [2023-04-27 09:54:49,343] [INFO] [logging.py:68:log_dist] [Rank 0] step=192000, skipped=0, lr=[0.00013871901004215242, 0.00013871901004215242, 0.00013871901004215242], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 192000/ 476837 | consumed samples: 49152000 | consumed tokens: 100663296000 | elapsed time per iteration (s): 0.68 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.575596E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.255 | TFLOPs: 22.82 | 0: steps: 192000 loss: 2.5512 iter time (s): 0.683 samples/sec: 374.568 31: iteration 192100/ 476837 | consumed samples: 49177600 | consumed tokens: 100715724800 | elapsed time per iteration (s): 0.68 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.576721E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.217 | TFLOPs: 22.82 | 31: iteration 192200/ 476837 | consumed samples: 49203200 | consumed tokens: 100768153600 | elapsed time per iteration (s): 0.68 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.574669E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.288 | TFLOPs: 22.82 | 31: iteration 192300/ 476837 | consumed samples: 49228800 | consumed tokens: 100820582400 | elapsed time per iteration (s): 0.68 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.574863E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.280 | TFLOPs: 22.82 | 31: iteration 192400/ 476837 | consumed samples: 49254400 | consumed tokens: 100873011200 | elapsed time per iteration (s): 0.68 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.581563E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.878 | TFLOPs: 22.80 | 31: iteration 192500/ 476837 | consumed samples: 49280000 | consumed tokens: 100925440000 | elapsed time per iteration (s): 0.68 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.580551E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.422 | TFLOPs: 22.77 | 31: iteration 192600/ 476837 | consumed samples: 49305600 | consumed tokens: 100977868800 | elapsed time per iteration (s): 0.71 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.572614E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.331 | TFLOPs: 21.74 | 31: iteration 192700/ 476837 | consumed samples: 49331200 | consumed tokens: 101030297600 | elapsed time per iteration (s): 0.68 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.573454E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.274 | TFLOPs: 22.82 | 31: iteration 192800/ 476837 | consumed samples: 49356800 | consumed tokens: 101082726400 | elapsed time per iteration (s): 0.69 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.577008E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.266 | TFLOPs: 22.34 | 31: iteration 192900/ 476837 | consumed samples: 49382400 | consumed tokens: 101135155200 | elapsed time per iteration (s): 0.68 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.577751E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.199 | TFLOPs: 22.70 | 31: iteration 193000/ 476837 | consumed samples: 49408000 | consumed tokens: 101187584000 | elapsed time per iteration (s): 0.68 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.575305E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.881 | TFLOPs: 22.68 | 31: iteration 193100/ 476837 | consumed samples: 49433600 | consumed tokens: 101240012800 | elapsed time per iteration (s): 0.68 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.575444E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.717 | TFLOPs: 22.79 | 31: iteration 193200/ 476837 | consumed samples: 49459200 | consumed tokens: 101292441600 | elapsed time per iteration (s): 0.68 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.579763E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.092 | TFLOPs: 22.81 | 31: iteration 193300/ 476837 | consumed samples: 49484800 | consumed tokens: 101344870400 | elapsed time per iteration (s): 0.72 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.578060E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 356.235 | TFLOPs: 21.55 | 31: iteration 193400/ 476837 | consumed samples: 49510400 | consumed tokens: 101397299200 | elapsed time per iteration (s): 0.68 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.577317E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.148 | TFLOPs: 22.70 | 31: iteration 193500/ 476837 | consumed samples: 49536000 | consumed tokens: 101449728000 | elapsed time per iteration (s): 0.68 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.575140E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.097 | TFLOPs: 22.75 | 31: iteration 193600/ 476837 | consumed samples: 49561600 | consumed tokens: 101502156800 | elapsed time per iteration (s): 0.68 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.577206E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.120 | TFLOPs: 22.81 | 31: iteration 193700/ 476837 | consumed samples: 49587200 | consumed tokens: 101554585600 | elapsed time per iteration (s): 0.69 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.576539E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.253 | TFLOPs: 22.58 | 31: iteration 193800/ 476837 | consumed samples: 49612800 | consumed tokens: 101607014400 | elapsed time per iteration (s): 0.69 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.577106E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.423 | TFLOPs: 22.41 | 31: iteration 193900/ 476837 | consumed samples: 49638400 | consumed tokens: 101659443200 | elapsed time per iteration (s): 0.69 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.574713E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.160 | TFLOPs: 22.45 | 0: [2023-04-27 10:17:40,257] [INFO] [logging.py:68:log_dist] [Rank 0] step=194000, skipped=0, lr=[0.0001375812338477785, 0.0001375812338477785, 0.0001375812338477785], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 194000/ 476837 | consumed samples: 49664000 | consumed tokens: 101711872000 | elapsed time per iteration (s): 0.68 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.572488E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.039 | TFLOPs: 22.81 | 0: steps: 194000 loss: 2.6060 iter time (s): 0.682 samples/sec: 375.227 31: iteration 194100/ 476837 | consumed samples: 49689600 | consumed tokens: 101764300800 | elapsed time per iteration (s): 0.68 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.579295E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.954 | TFLOPs: 22.80 | 31: iteration 194200/ 476837 | consumed samples: 49715200 | consumed tokens: 101816729600 | elapsed time per iteration (s): 0.68 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.573950E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.894 | TFLOPs: 22.62 | 31: iteration 194300/ 476837 | consumed samples: 49740800 | consumed tokens: 101869158400 | elapsed time per iteration (s): 0.70 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.577032E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.306 | TFLOPs: 22.04 | 31: iteration 194400/ 476837 | consumed samples: 49766400 | consumed tokens: 101921587200 | elapsed time per iteration (s): 0.71 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.574727E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.886 | TFLOPs: 21.71 | 31: iteration 194500/ 476837 | consumed samples: 49792000 | consumed tokens: 101974016000 | elapsed time per iteration (s): 0.68 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.573391E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.081 | TFLOPs: 22.81 | 31: iteration 194600/ 476837 | consumed samples: 49817600 | consumed tokens: 102026444800 | elapsed time per iteration (s): 0.68 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.573745E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.047 | TFLOPs: 22.81 | 31: iteration 194700/ 476837 | consumed samples: 49843200 | consumed tokens: 102078873600 | elapsed time per iteration (s): 0.68 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.576721E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.800 | TFLOPs: 22.67 | 31: iteration 194800/ 476837 | consumed samples: 49868800 | consumed tokens: 102131302400 | elapsed time per iteration (s): 0.68 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.571627E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.922 | TFLOPs: 22.80 | 31: iteration 194900/ 476837 | consumed samples: 49894400 | consumed tokens: 102183731200 | elapsed time per iteration (s): 0.68 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.579304E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.069 | TFLOPs: 22.81 | 31: iteration 195000/ 476837 | consumed samples: 49920000 | consumed tokens: 102236160000 | elapsed time per iteration (s): 0.68 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.576178E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.931 | TFLOPs: 22.74 | 31: iteration 195100/ 476837 | consumed samples: 49945600 | consumed tokens: 102288588800 | elapsed time per iteration (s): 0.68 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.573043E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.894 | TFLOPs: 22.80 | 31: iteration 195200/ 476837 | consumed samples: 49971200 | consumed tokens: 102341017600 | elapsed time per iteration (s): 0.72 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.572888E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.027 | TFLOPs: 21.60 | 31: iteration 195300/ 476837 | consumed samples: 49996800 | consumed tokens: 102393446400 | elapsed time per iteration (s): 0.68 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.575767E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.132 | TFLOPs: 22.82 | 31: iteration 195400/ 476837 | consumed samples: 50022400 | consumed tokens: 102445875200 | elapsed time per iteration (s): 0.68 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.581226E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.096 | TFLOPs: 22.81 | 31: iteration 195500/ 476837 | consumed samples: 50048000 | consumed tokens: 102498304000 | elapsed time per iteration (s): 0.72 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.577434E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.495 | TFLOPs: 21.63 | 31: iteration 195600/ 476837 | consumed samples: 50073600 | consumed tokens: 102550732800 | elapsed time per iteration (s): 0.69 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.576139E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.432 | TFLOPs: 22.35 | 31: iteration 195700/ 476837 | consumed samples: 50099200 | consumed tokens: 102603161600 | elapsed time per iteration (s): 0.68 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.574920E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.943 | TFLOPs: 22.74 | 31: iteration 195800/ 476837 | consumed samples: 50124800 | consumed tokens: 102655590400 | elapsed time per iteration (s): 0.68 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.576080E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.955 | TFLOPs: 22.62 | 31: iteration 195900/ 476837 | consumed samples: 50150400 | consumed tokens: 102708019200 | elapsed time per iteration (s): 0.68 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.576178E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.565 | TFLOPs: 22.78 | 0: [2023-04-27 10:40:35,053] [INFO] [logging.py:68:log_dist] [Rank 0] step=196000, skipped=0, lr=[0.0001364385716199082, 0.0001364385716199082, 0.0001364385716199082], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 196000/ 476837 | consumed samples: 50176000 | consumed tokens: 102760448000 | elapsed time per iteration (s): 0.68 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.573541E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.579 | TFLOPs: 22.78 | 0: steps: 196000 loss: 2.5721 iter time (s): 0.684 samples/sec: 374.280 31: iteration 196100/ 476837 | consumed samples: 50201600 | consumed tokens: 102812876800 | elapsed time per iteration (s): 0.70 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.570559E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.006 | TFLOPs: 22.26 | 31: iteration 196200/ 476837 | consumed samples: 50227200 | consumed tokens: 102865305600 | elapsed time per iteration (s): 0.71 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.572157E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.631 | TFLOPs: 21.70 | 31: iteration 196300/ 476837 | consumed samples: 50252800 | consumed tokens: 102917734400 | elapsed time per iteration (s): 0.73 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.571829E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.847 | TFLOPs: 21.16 | 31: iteration 196400/ 476837 | consumed samples: 50278400 | consumed tokens: 102970163200 | elapsed time per iteration (s): 0.69 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.575300E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.161 | TFLOPs: 22.45 | 31: iteration 196500/ 476837 | consumed samples: 50304000 | consumed tokens: 103022592000 | elapsed time per iteration (s): 0.68 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.570132E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.684 | TFLOPs: 22.79 | 31: iteration 196600/ 476837 | consumed samples: 50329600 | consumed tokens: 103075020800 | elapsed time per iteration (s): 0.68 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.573218E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.022 | TFLOPs: 22.63 | 31: iteration 196700/ 476837 | consumed samples: 50355200 | consumed tokens: 103127449600 | elapsed time per iteration (s): 0.68 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.570683E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.560 | TFLOPs: 22.78 | 31: iteration 196800/ 476837 | consumed samples: 50380800 | consumed tokens: 103179878400 | elapsed time per iteration (s): 0.68 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.574667E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.676 | TFLOPs: 22.79 | 31: iteration 196900/ 476837 | consumed samples: 50406400 | consumed tokens: 103232307200 | elapsed time per iteration (s): 0.69 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.574794E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.524 | TFLOPs: 22.29 | 31: iteration 197000/ 476837 | consumed samples: 50432000 | consumed tokens: 103284736000 | elapsed time per iteration (s): 0.68 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.570695E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.094 | TFLOPs: 22.63 | 31: iteration 197100/ 476837 | consumed samples: 50457600 | consumed tokens: 103337164800 | elapsed time per iteration (s): 0.71 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.582672E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.717 | TFLOPs: 21.82 | 31: iteration 197200/ 476837 | consumed samples: 50483200 | consumed tokens: 103389593600 | elapsed time per iteration (s): 0.69 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.576370E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.824 | TFLOPs: 22.55 | 31: iteration 197300/ 476837 | consumed samples: 50508800 | consumed tokens: 103442022400 | elapsed time per iteration (s): 0.68 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.574444E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.974 | TFLOPs: 22.81 | 31: iteration 197400/ 476837 | consumed samples: 50534400 | consumed tokens: 103494451200 | elapsed time per iteration (s): 0.68 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.574947E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.136 | TFLOPs: 22.76 | 31: iteration 197500/ 476837 | consumed samples: 50560000 | consumed tokens: 103546880000 | elapsed time per iteration (s): 0.69 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.574730E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.124 | TFLOPs: 22.45 | 31: iteration 197600/ 476837 | consumed samples: 50585600 | consumed tokens: 103599308800 | elapsed time per iteration (s): 0.69 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.568825E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.752 | TFLOPs: 22.49 | 31: iteration 197700/ 476837 | consumed samples: 50611200 | consumed tokens: 103651737600 | elapsed time per iteration (s): 0.68 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.574007E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.993 | TFLOPs: 22.81 | 31: iteration 197800/ 476837 | consumed samples: 50636800 | consumed tokens: 103704166400 | elapsed time per iteration (s): 0.68 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.567937E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.014 | TFLOPs: 22.81 | 31: iteration 197900/ 476837 | consumed samples: 50662400 | consumed tokens: 103756595200 | elapsed time per iteration (s): 0.68 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.572521E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.822 | TFLOPs: 22.80 | 0: [2023-04-27 11:03:35,605] [INFO] [logging.py:68:log_dist] [Rank 0] step=198000, skipped=0, lr=[0.00013529122578189752, 0.00013529122578189752, 0.00013529122578189752], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 198000/ 476837 | consumed samples: 50688000 | consumed tokens: 103809024000 | elapsed time per iteration (s): 0.70 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.572097E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.651 | TFLOPs: 22.12 | 0: steps: 198000 loss: 2.5771 iter time (s): 0.687 samples/sec: 372.656 31: iteration 198100/ 476837 | consumed samples: 50713600 | consumed tokens: 103861452800 | elapsed time per iteration (s): 0.70 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.569454E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.641 | TFLOPs: 22.24 | 31: iteration 198200/ 476837 | consumed samples: 50739200 | consumed tokens: 103913881600 | elapsed time per iteration (s): 0.69 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.572328E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.026 | TFLOPs: 22.57 | 31: iteration 198300/ 476837 | consumed samples: 50764800 | consumed tokens: 103966310400 | elapsed time per iteration (s): 0.68 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.574059E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.124 | TFLOPs: 22.82 | 31: iteration 198400/ 476837 | consumed samples: 50790400 | consumed tokens: 104018739200 | elapsed time per iteration (s): 0.68 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.569558E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.890 | TFLOPs: 22.80 | 31: iteration 198500/ 476837 | consumed samples: 50816000 | consumed tokens: 104071168000 | elapsed time per iteration (s): 0.68 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.571357E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.664 | TFLOPs: 22.79 | 31: iteration 198600/ 476837 | consumed samples: 50841600 | consumed tokens: 104123596800 | elapsed time per iteration (s): 0.68 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.573270E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.339 | TFLOPs: 22.71 | 31: iteration 198700/ 476837 | consumed samples: 50867200 | consumed tokens: 104176025600 | elapsed time per iteration (s): 0.68 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.571681E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.039 | TFLOPs: 22.75 | 31: iteration 198800/ 476837 | consumed samples: 50892800 | consumed tokens: 104228454400 | elapsed time per iteration (s): 0.68 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.573562E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.007 | TFLOPs: 22.81 | 31: iteration 198900/ 476837 | consumed samples: 50918400 | consumed tokens: 104280883200 | elapsed time per iteration (s): 0.68 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.574074E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.141 | TFLOPs: 22.63 | 31: iteration 199000/ 476837 | consumed samples: 50944000 | consumed tokens: 104333312000 | elapsed time per iteration (s): 0.68 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.569048E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.956 | TFLOPs: 22.80 | 31: iteration 199100/ 476837 | consumed samples: 50969600 | consumed tokens: 104385740800 | elapsed time per iteration (s): 0.68 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.575352E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.322 | TFLOPs: 22.65 | 31: iteration 199200/ 476837 | consumed samples: 50995200 | consumed tokens: 104438169600 | elapsed time per iteration (s): 0.68 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.567248E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.099 | TFLOPs: 22.81 | 31: iteration 199300/ 476837 | consumed samples: 51020800 | consumed tokens: 104490598400 | elapsed time per iteration (s): 0.68 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.571313E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.022 | TFLOPs: 22.81 | 31: iteration 199400/ 476837 | consumed samples: 51046400 | consumed tokens: 104543027200 | elapsed time per iteration (s): 0.70 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.571440E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.322 | TFLOPs: 22.28 | 31: iteration 199500/ 476837 | consumed samples: 51072000 | consumed tokens: 104595456000 | elapsed time per iteration (s): 0.68 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.569335E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.902 | TFLOPs: 22.80 | 31: iteration 199600/ 476837 | consumed samples: 51097600 | consumed tokens: 104647884800 | elapsed time per iteration (s): 0.68 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.568237E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.767 | TFLOPs: 22.79 | 31: iteration 199700/ 476837 | consumed samples: 51123200 | consumed tokens: 104700313600 | elapsed time per iteration (s): 0.68 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.573682E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.848 | TFLOPs: 22.80 | 31: iteration 199800/ 476837 | consumed samples: 51148800 | consumed tokens: 104752742400 | elapsed time per iteration (s): 0.68 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.571503E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.752 | TFLOPs: 22.79 | 31: iteration 199900/ 476837 | consumed samples: 51174400 | consumed tokens: 104805171200 | elapsed time per iteration (s): 0.72 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.570802E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.350 | TFLOPs: 21.62 | 0: [2023-04-27 11:26:23,440] [INFO] [logging.py:68:log_dist] [Rank 0] step=200000, skipped=0, lr=[0.00013413939958680716, 0.00013413939958680716, 0.00013413939958680716], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 200000/ 476837 | consumed samples: 51200000 | consumed tokens: 104857600000 | elapsed time per iteration (s): 0.68 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.574055E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.306 | TFLOPs: 22.71 | 0: steps: 200000 loss: 2.5416 iter time (s): 0.681 samples/sec: 376.152 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 200000 | lm loss value: 2.922034E+00 | lm loss PPL: 1.857903E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 200000 to checkpoints_1b1250b1b5 0: [2023-04-27 11:26:23,818] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step200000 is begin to save! 0: [2023-04-27 11:26:23,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_01-model_00-model_states.pt... 0: [2023-04-27 11:26:24,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_01-model_00-model_states.pt. 0: [2023-04-27 11:26:24,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_03-model_00-model_states.pt... 0: [2023-04-27 11:26:24,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_03-model_00-model_states.pt. 0: [2023-04-27 11:26:24,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_04-model_00-model_states.pt... 0: [2023-04-27 11:26:24,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_04-model_00-model_states.pt. 0: [2023-04-27 11:26:24,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_05-model_00-model_states.pt... 0: [2023-04-27 11:26:24,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_05-model_00-model_states.pt. 0: [2023-04-27 11:26:24,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_06-model_00-model_states.pt... 0: [2023-04-27 11:26:24,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_06-model_00-model_states.pt. 0: [2023-04-27 11:26:24,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_07-model_00-model_states.pt... 0: [2023-04-27 11:26:24,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_07-model_00-model_states.pt. 0: [2023-04-27 11:26:24,589] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_08-model_00-model_states.pt... 0: [2023-04-27 11:26:24,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_08-model_00-model_states.pt. 0: [2023-04-27 11:26:24,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_09-model_00-model_states.pt... 0: [2023-04-27 11:26:24,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_09-model_00-model_states.pt. 0: [2023-04-27 11:26:24,766] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_10-model_00-model_states.pt... 0: [2023-04-27 11:26:24,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_10-model_00-model_states.pt. 0: [2023-04-27 11:26:24,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_11-model_00-model_states.pt... 0: [2023-04-27 11:26:24,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_11-model_00-model_states.pt. 0: [2023-04-27 11:26:24,942] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_12-model_00-model_states.pt... 0: [2023-04-27 11:26:25,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_12-model_00-model_states.pt. 0: [2023-04-27 11:26:25,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_13-model_00-model_states.pt... 0: [2023-04-27 11:26:25,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_13-model_00-model_states.pt. 0: [2023-04-27 11:26:25,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_14-model_00-model_states.pt... 0: [2023-04-27 11:26:25,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_14-model_00-model_states.pt. 0: [2023-04-27 11:26:25,196] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_15-model_00-model_states.pt... 0: [2023-04-27 11:26:25,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_15-model_00-model_states.pt. 0: [2023-04-27 11:26:25,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_16-model_00-model_states.pt... 0: [2023-04-27 11:26:25,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_16-model_00-model_states.pt. 0: [2023-04-27 11:26:25,369] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_17-model_00-model_states.pt... 0: [2023-04-27 11:26:25,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_17-model_00-model_states.pt. 0: [2023-04-27 11:26:25,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_18-model_00-model_states.pt... 0: [2023-04-27 11:26:25,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_18-model_00-model_states.pt. 0: [2023-04-27 11:26:25,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_19-model_00-model_states.pt... 0: [2023-04-27 11:26:25,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_19-model_00-model_states.pt. 0: [2023-04-27 11:26:25,636] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_20-model_00-model_states.pt... 0: [2023-04-27 11:26:25,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_20-model_00-model_states.pt. 0: [2023-04-27 11:26:25,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_21-model_00-model_states.pt... 0: [2023-04-27 11:26:25,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_21-model_00-model_states.pt. 0: [2023-04-27 11:26:25,814] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_22-model_00-model_states.pt... 0: [2023-04-27 11:26:25,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_22-model_00-model_states.pt. 0: [2023-04-27 11:26:25,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_23-model_00-model_states.pt... 0: [2023-04-27 11:26:25,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_23-model_00-model_states.pt. 0: [2023-04-27 11:26:25,992] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_24-model_00-model_states.pt... 0: [2023-04-27 11:26:26,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_24-model_00-model_states.pt. 0: [2023-04-27 11:26:26,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_25-model_00-model_states.pt... 0: [2023-04-27 11:26:26,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_25-model_00-model_states.pt. 0: [2023-04-27 11:26:26,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_26-model_00-model_states.pt... 0: [2023-04-27 11:26:26,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_26-model_00-model_states.pt. 0: [2023-04-27 11:26:26,246] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_27-model_00-model_states.pt... 0: [2023-04-27 11:26:26,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_27-model_00-model_states.pt. 0: [2023-04-27 11:26:26,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_28-model_00-model_states.pt... 0: [2023-04-27 11:26:26,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_28-model_00-model_states.pt. 0: [2023-04-27 11:26:26,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/layer_30-model_00-model_states.pt... 0: [2023-04-27 11:26:26,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/layer_30-model_00-model_states.pt. 0: [2023-04-27 11:26:26,427] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step200000/mp_rank_00_model_states.pt 0: [2023-04-27 11:26:26,427] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/mp_rank_00_model_states.pt... 0: [2023-04-27 11:26:26,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/mp_rank_00_model_states.pt. 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 3: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 13: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 9: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 12: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 15: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 20: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 24: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 23: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 30: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 16: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 8: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 19: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 18: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 31: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 6: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 17: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2023-04-27 11:26:26,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 10: [2023-04-27 11:26:26,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-27 11:26:26,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-27 11:26:26,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,755] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,759] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-27 11:26:26,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-27 11:26:26,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-27 11:26:26,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-27 11:26:26,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,758] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 10: [2023-04-27 11:26:26,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-27 11:26:26,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-27 11:26:26,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 15: [2023-04-27 11:26:26,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-27 11:26:26,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-27 11:26:26,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 7: [2023-04-27 11:26:26,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 25: [2023-04-27 11:26:26,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 14: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 30: [2023-04-27 11:26:26,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 1: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 4: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 11:26:26,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-27 11:26:26,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 2: [2023-04-27 11:26:26,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: [2023-04-27 11:26:26,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 0: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 19: [2023-04-27 11:26:26,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 19: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-27 11:26:26,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 22: [2023-04-27 11:26:26,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-27 11:26:26,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-27 11:26:26,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 12: [2023-04-27 11:26:26,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 16: [2023-04-27 11:26:26,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 11:26:26,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-27 11:26:26,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 9: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 20: [2023-04-27 11:26:26,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 29: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-27 11:26:26,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 4: [2023-04-27 11:26:26,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 13: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-27 11:26:26,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-27 11:26:26,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 27: [2023-04-27 11:26:26,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 11:26:26,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-27 11:26:26,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 11: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-27 11:26:26,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-27 11:26:26,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 3: [2023-04-27 11:26:26,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 23: [2023-04-27 11:26:26,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-27 11:26:26,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 8: [2023-04-27 11:26:26,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-27 11:26:26,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-27 11:26:26,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 26: [2023-04-27 11:26:26,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 11:26:26,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-27 11:26:26,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 21: [2023-04-27 11:26:26,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 18: [2023-04-27 11:26:26,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 24: [2023-04-27 11:26:26,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 17: [2023-04-27 11:26:26,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 28: [2023-04-27 11:26:26,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 11:26:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-27 11:26:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 31: [2023-04-27 11:26:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-27 11:26:26,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step200000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-27 11:26:26,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 6: [2023-04-27 11:26:26,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step200000 is ready now! 0: successfully saved checkpoint at iteration 200000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 3163.44 31: iteration 200100/ 476837 | consumed samples: 51225600 | consumed tokens: 104910028800 | elapsed time per iteration (s): 0.71 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.572698E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.346 | TFLOPs: 21.68 | 31: iteration 200200/ 476837 | consumed samples: 51251200 | consumed tokens: 104962457600 | elapsed time per iteration (s): 0.68 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.570203E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.682 | TFLOPs: 22.79 | 31: iteration 200300/ 476837 | consumed samples: 51276800 | consumed tokens: 105014886400 | elapsed time per iteration (s): 0.71 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.568621E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.386 | TFLOPs: 21.92 | 31: iteration 200400/ 476837 | consumed samples: 51302400 | consumed tokens: 105067315200 | elapsed time per iteration (s): 0.71 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.568544E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.397 | TFLOPs: 21.68 | 31: iteration 200500/ 476837 | consumed samples: 51328000 | consumed tokens: 105119744000 | elapsed time per iteration (s): 0.68 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.572391E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.622 | TFLOPs: 22.78 | 31: iteration 200600/ 476837 | consumed samples: 51353600 | consumed tokens: 105172172800 | elapsed time per iteration (s): 0.68 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.567631E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.660 | TFLOPs: 22.79 | 31: iteration 200700/ 476837 | consumed samples: 51379200 | consumed tokens: 105224601600 | elapsed time per iteration (s): 0.68 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.565880E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.980 | TFLOPs: 22.75 | 31: iteration 200800/ 476837 | consumed samples: 51404800 | consumed tokens: 105277030400 | elapsed time per iteration (s): 0.68 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.573356E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.856 | TFLOPs: 22.80 | 31: iteration 200900/ 476837 | consumed samples: 51430400 | consumed tokens: 105329459200 | elapsed time per iteration (s): 0.68 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.568737E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.951 | TFLOPs: 22.80 | 31: iteration 201000/ 476837 | consumed samples: 51456000 | consumed tokens: 105381888000 | elapsed time per iteration (s): 0.69 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.570961E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.379 | TFLOPs: 22.41 | 31: iteration 201100/ 476837 | consumed samples: 51481600 | consumed tokens: 105434316800 | elapsed time per iteration (s): 0.71 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.571144E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.239 | TFLOPs: 21.79 | 31: iteration 201200/ 476837 | consumed samples: 51507200 | consumed tokens: 105486745600 | elapsed time per iteration (s): 0.68 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.565136E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.763 | TFLOPs: 22.79 | 31: iteration 201300/ 476837 | consumed samples: 51532800 | consumed tokens: 105539174400 | elapsed time per iteration (s): 0.69 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.571522E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.738 | TFLOPs: 22.49 | 31: iteration 201400/ 476837 | consumed samples: 51558400 | consumed tokens: 105591603200 | elapsed time per iteration (s): 0.68 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.572142E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.177 | TFLOPs: 22.76 | 31: iteration 201500/ 476837 | consumed samples: 51584000 | consumed tokens: 105644032000 | elapsed time per iteration (s): 0.72 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.567954E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 354.252 | TFLOPs: 21.43 | 31: iteration 201600/ 476837 | consumed samples: 51609600 | consumed tokens: 105696460800 | elapsed time per iteration (s): 0.68 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.570986E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.489 | TFLOPs: 22.78 | 31: iteration 201700/ 476837 | consumed samples: 51635200 | consumed tokens: 105748889600 | elapsed time per iteration (s): 0.68 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.570617E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.573 | TFLOPs: 22.78 | 31: iteration 201800/ 476837 | consumed samples: 51660800 | consumed tokens: 105801318400 | elapsed time per iteration (s): 0.73 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.570273E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.864 | TFLOPs: 21.17 | 31: iteration 201900/ 476837 | consumed samples: 51686400 | consumed tokens: 105853747200 | elapsed time per iteration (s): 0.76 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.568983E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 337.355 | TFLOPs: 20.41 | 0: [2023-04-27 11:49:35,128] [INFO] [logging.py:68:log_dist] [Rank 0] step=202000, skipped=0, lr=[0.00013298329708139595, 0.00013298329708139595, 0.00013298329708139595], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 202000/ 476837 | consumed samples: 51712000 | consumed tokens: 105906176000 | elapsed time per iteration (s): 0.68 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.571071E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.490 | TFLOPs: 22.78 | 0: steps: 202000 loss: 2.5546 iter time (s): 0.692 samples/sec: 370.203 31: iteration 202100/ 476837 | consumed samples: 51737600 | consumed tokens: 105958604800 | elapsed time per iteration (s): 0.68 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.562767E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.239 | TFLOPs: 22.70 | 31: iteration 202200/ 476837 | consumed samples: 51763200 | consumed tokens: 106011033600 | elapsed time per iteration (s): 0.69 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.565617E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.370 | TFLOPs: 22.41 | 31: iteration 202300/ 476837 | consumed samples: 51788800 | consumed tokens: 106063462400 | elapsed time per iteration (s): 0.68 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.570401E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.632 | TFLOPs: 22.79 | 31: iteration 202400/ 476837 | consumed samples: 51814400 | consumed tokens: 106115891200 | elapsed time per iteration (s): 0.68 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.572501E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.628 | TFLOPs: 22.79 | 31: iteration 202500/ 476837 | consumed samples: 51840000 | consumed tokens: 106168320000 | elapsed time per iteration (s): 0.68 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.566884E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.612 | TFLOPs: 22.72 | 31: iteration 202600/ 476837 | consumed samples: 51865600 | consumed tokens: 106220748800 | elapsed time per iteration (s): 0.70 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.567109E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 363.508 | TFLOPs: 21.99 | 31: iteration 202700/ 476837 | consumed samples: 51891200 | consumed tokens: 106273177600 | elapsed time per iteration (s): 0.76 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.571347E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 335.862 | TFLOPs: 20.32 | 31: iteration 202800/ 476837 | consumed samples: 51916800 | consumed tokens: 106325606400 | elapsed time per iteration (s): 0.70 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.562290E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.071 | TFLOPs: 22.09 | 31: iteration 202900/ 476837 | consumed samples: 51942400 | consumed tokens: 106378035200 | elapsed time per iteration (s): 0.72 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.573103E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 356.575 | TFLOPs: 21.57 | 31: iteration 203000/ 476837 | consumed samples: 51968000 | consumed tokens: 106430464000 | elapsed time per iteration (s): 0.68 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.565309E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.663 | TFLOPs: 22.79 | 31: iteration 203100/ 476837 | consumed samples: 51993600 | consumed tokens: 106482892800 | elapsed time per iteration (s): 0.69 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.565786E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.960 | TFLOPs: 22.32 | 31: iteration 203200/ 476837 | consumed samples: 52019200 | consumed tokens: 106535321600 | elapsed time per iteration (s): 0.77 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.569630E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 333.874 | TFLOPs: 20.20 | 31: iteration 203300/ 476837 | consumed samples: 52044800 | consumed tokens: 106587750400 | elapsed time per iteration (s): 0.68 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.567593E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.658 | TFLOPs: 22.79 | 31: iteration 203400/ 476837 | consumed samples: 52070400 | consumed tokens: 106640179200 | elapsed time per iteration (s): 0.68 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.572638E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.870 | TFLOPs: 22.62 | 31: iteration 203500/ 476837 | consumed samples: 52096000 | consumed tokens: 106692608000 | elapsed time per iteration (s): 0.68 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.569658E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.689 | TFLOPs: 22.79 | 31: iteration 203600/ 476837 | consumed samples: 52121600 | consumed tokens: 106745036800 | elapsed time per iteration (s): 0.68 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.565127E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.644 | TFLOPs: 22.79 | 31: iteration 203700/ 476837 | consumed samples: 52147200 | consumed tokens: 106797465600 | elapsed time per iteration (s): 0.77 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.568723E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 331.680 | TFLOPs: 20.07 | 31: iteration 203800/ 476837 | consumed samples: 52172800 | consumed tokens: 106849894400 | elapsed time per iteration (s): 0.70 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.569202E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.817 | TFLOPs: 22.13 | 31: iteration 203900/ 476837 | consumed samples: 52198400 | consumed tokens: 106902323200 | elapsed time per iteration (s): 0.69 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.572726E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.652 | TFLOPs: 22.60 | 0: [2023-04-27 12:12:55,127] [INFO] [logging.py:68:log_dist] [Rank 0] step=204000, skipped=0, lr=[0.00013182312306997388, 0.00013182312306997388, 0.00013182312306997388], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 204000/ 476837 | consumed samples: 52224000 | consumed tokens: 106954752000 | elapsed time per iteration (s): 0.68 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.567124E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.894 | TFLOPs: 22.80 | 0: steps: 204000 loss: 2.5973 iter time (s): 0.697 samples/sec: 367.475 31: iteration 204100/ 476837 | consumed samples: 52249600 | consumed tokens: 107007180800 | elapsed time per iteration (s): 0.72 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.569606E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 354.012 | TFLOPs: 21.42 | 31: iteration 204200/ 476837 | consumed samples: 52275200 | consumed tokens: 107059609600 | elapsed time per iteration (s): 0.75 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.569004E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 342.355 | TFLOPs: 20.71 | 31: iteration 204300/ 476837 | consumed samples: 52300800 | consumed tokens: 107112038400 | elapsed time per iteration (s): 0.68 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.565380E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.505 | TFLOPs: 22.72 | 31: iteration 204400/ 476837 | consumed samples: 52326400 | consumed tokens: 107164467200 | elapsed time per iteration (s): 0.68 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.566338E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.780 | TFLOPs: 22.79 | 31: iteration 204500/ 476837 | consumed samples: 52352000 | consumed tokens: 107216896000 | elapsed time per iteration (s): 0.68 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.567408E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.386 | TFLOPs: 22.77 | 31: iteration 204600/ 476837 | consumed samples: 52377600 | consumed tokens: 107269324800 | elapsed time per iteration (s): 0.68 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.564047E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.984 | TFLOPs: 22.81 | 31: iteration 204700/ 476837 | consumed samples: 52403200 | consumed tokens: 107321753600 | elapsed time per iteration (s): 0.68 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.569686E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.865 | TFLOPs: 22.62 | 31: iteration 204800/ 476837 | consumed samples: 52428800 | consumed tokens: 107374182400 | elapsed time per iteration (s): 0.68 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.569745E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.135 | TFLOPs: 22.82 | 31: iteration 204900/ 476837 | consumed samples: 52454400 | consumed tokens: 107426611200 | elapsed time per iteration (s): 0.70 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.566375E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.022 | TFLOPs: 22.08 | 31: iteration 205000/ 476837 | consumed samples: 52480000 | consumed tokens: 107479040000 | elapsed time per iteration (s): 0.68 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.565717E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.650 | TFLOPs: 22.79 | 31: iteration 205100/ 476837 | consumed samples: 52505600 | consumed tokens: 107531468800 | elapsed time per iteration (s): 0.70 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.565603E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.128 | TFLOPs: 22.03 | 31: iteration 205200/ 476837 | consumed samples: 52531200 | consumed tokens: 107583897600 | elapsed time per iteration (s): 0.68 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.570772E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.657 | TFLOPs: 22.73 | 31: iteration 205300/ 476837 | consumed samples: 52556800 | consumed tokens: 107636326400 | elapsed time per iteration (s): 0.68 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.563325E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.818 | TFLOPs: 22.80 | 31: iteration 205400/ 476837 | consumed samples: 52582400 | consumed tokens: 107688755200 | elapsed time per iteration (s): 0.73 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.566848E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.845 | TFLOPs: 21.23 | 31: iteration 205500/ 476837 | consumed samples: 52608000 | consumed tokens: 107741184000 | elapsed time per iteration (s): 0.72 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.565056E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.060 | TFLOPs: 21.48 | 31: iteration 205600/ 476837 | consumed samples: 52633600 | consumed tokens: 107793612800 | elapsed time per iteration (s): 0.74 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.568551E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 348.036 | TFLOPs: 21.06 | 31: iteration 205700/ 476837 | consumed samples: 52659200 | consumed tokens: 107846041600 | elapsed time per iteration (s): 0.69 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.567813E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.368 | TFLOPs: 22.47 | 31: iteration 205800/ 476837 | consumed samples: 52684800 | consumed tokens: 107898470400 | elapsed time per iteration (s): 0.68 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.567901E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.467 | TFLOPs: 22.78 | 31: iteration 205900/ 476837 | consumed samples: 52710400 | consumed tokens: 107950899200 | elapsed time per iteration (s): 0.68 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.564252E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.367 | TFLOPs: 22.77 | 0: [2023-04-27 12:36:06,720] [INFO] [logging.py:68:log_dist] [Rank 0] step=206000, skipped=0, lr=[0.00013065908307812084, 0.00013065908307812084, 0.00013065908307812084], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 206000/ 476837 | consumed samples: 52736000 | consumed tokens: 108003328000 | elapsed time per iteration (s): 0.68 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.567752E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.244 | TFLOPs: 22.76 | 0: steps: 206000 loss: 2.5629 iter time (s): 0.692 samples/sec: 369.833 31: iteration 206100/ 476837 | consumed samples: 52761600 | consumed tokens: 108055756800 | elapsed time per iteration (s): 0.68 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.568821E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.598 | TFLOPs: 22.78 | 31: iteration 206200/ 476837 | consumed samples: 52787200 | consumed tokens: 108108185600 | elapsed time per iteration (s): 0.70 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.563988E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 363.283 | TFLOPs: 21.98 | 31: iteration 206300/ 476837 | consumed samples: 52812800 | consumed tokens: 108160614400 | elapsed time per iteration (s): 0.68 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.564758E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.739 | TFLOPs: 22.79 | 31: iteration 206400/ 476837 | consumed samples: 52838400 | consumed tokens: 108213043200 | elapsed time per iteration (s): 0.71 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.566154E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.740 | TFLOPs: 21.76 | 31: iteration 206500/ 476837 | consumed samples: 52864000 | consumed tokens: 108265472000 | elapsed time per iteration (s): 0.70 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.567561E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.565 | TFLOPs: 22.12 | 31: iteration 206600/ 476837 | consumed samples: 52889600 | consumed tokens: 108317900800 | elapsed time per iteration (s): 0.72 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.568555E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.710 | TFLOPs: 21.52 | 31: iteration 206700/ 476837 | consumed samples: 52915200 | consumed tokens: 108370329600 | elapsed time per iteration (s): 0.68 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.562842E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.423 | TFLOPs: 22.71 | 31: iteration 206800/ 476837 | consumed samples: 52940800 | consumed tokens: 108422758400 | elapsed time per iteration (s): 0.68 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.568155E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.295 | TFLOPs: 22.70 | 31: iteration 206900/ 476837 | consumed samples: 52966400 | consumed tokens: 108475187200 | elapsed time per iteration (s): 0.68 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.560077E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.090 | TFLOPs: 22.75 | 31: iteration 207000/ 476837 | consumed samples: 52992000 | consumed tokens: 108527616000 | elapsed time per iteration (s): 0.68 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.563440E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.276 | TFLOPs: 22.70 | 31: iteration 207100/ 476837 | consumed samples: 53017600 | consumed tokens: 108580044800 | elapsed time per iteration (s): 0.69 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.562754E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.493 | TFLOPs: 22.41 | 31: iteration 207200/ 476837 | consumed samples: 53043200 | consumed tokens: 108632473600 | elapsed time per iteration (s): 0.68 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.562842E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.641 | TFLOPs: 22.79 | 31: iteration 207300/ 476837 | consumed samples: 53068800 | consumed tokens: 108684902400 | elapsed time per iteration (s): 0.68 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.563448E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.702 | TFLOPs: 22.79 | 31: iteration 207400/ 476837 | consumed samples: 53094400 | consumed tokens: 108737331200 | elapsed time per iteration (s): 0.74 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.563792E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.717 | TFLOPs: 20.98 | 31: iteration 207500/ 476837 | consumed samples: 53120000 | consumed tokens: 108789760000 | elapsed time per iteration (s): 0.68 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.568599E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.675 | TFLOPs: 22.73 | 31: iteration 207600/ 476837 | consumed samples: 53145600 | consumed tokens: 108842188800 | elapsed time per iteration (s): 0.74 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.566084E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 347.428 | TFLOPs: 21.02 | 31: iteration 207700/ 476837 | consumed samples: 53171200 | consumed tokens: 108894617600 | elapsed time per iteration (s): 0.68 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.563960E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.600 | TFLOPs: 22.78 | 31: iteration 207800/ 476837 | consumed samples: 53196800 | consumed tokens: 108947046400 | elapsed time per iteration (s): 0.68 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.567206E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.577 | TFLOPs: 22.72 | 31: iteration 207900/ 476837 | consumed samples: 53222400 | consumed tokens: 108999475200 | elapsed time per iteration (s): 0.68 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.562057E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.646 | TFLOPs: 22.79 | 0: [2023-04-27 12:59:11,720] [INFO] [logging.py:68:log_dist] [Rank 0] step=208000, skipped=0, lr=[0.00012949138331627775, 0.00012949138331627775, 0.00012949138331627775], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 208000/ 476837 | consumed samples: 53248000 | consumed tokens: 109051904000 | elapsed time per iteration (s): 0.68 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.562927E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.694 | TFLOPs: 22.79 | 0: steps: 208000 loss: 2.5402 iter time (s): 0.689 samples/sec: 371.573 31: iteration 208100/ 476837 | consumed samples: 53273600 | consumed tokens: 109104332800 | elapsed time per iteration (s): 0.68 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.564844E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.328 | TFLOPs: 22.65 | 31: iteration 208200/ 476837 | consumed samples: 53299200 | consumed tokens: 109156761600 | elapsed time per iteration (s): 0.69 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.563367E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.969 | TFLOPs: 22.56 | 31: iteration 208300/ 476837 | consumed samples: 53324800 | consumed tokens: 109209190400 | elapsed time per iteration (s): 0.69 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.561122E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.166 | TFLOPs: 22.58 | 31: iteration 208400/ 476837 | consumed samples: 53350400 | consumed tokens: 109261619200 | elapsed time per iteration (s): 0.68 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.563781E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.206 | TFLOPs: 22.70 | 31: iteration 208500/ 476837 | consumed samples: 53376000 | consumed tokens: 109314048000 | elapsed time per iteration (s): 0.69 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.562418E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.954 | TFLOPs: 22.38 | 31: iteration 208600/ 476837 | consumed samples: 53401600 | consumed tokens: 109366476800 | elapsed time per iteration (s): 0.68 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.562051E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.780 | TFLOPs: 22.79 | 31: iteration 208700/ 476837 | consumed samples: 53427200 | consumed tokens: 109418905600 | elapsed time per iteration (s): 0.71 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.562339E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.860 | TFLOPs: 21.71 | 31: iteration 208800/ 476837 | consumed samples: 53452800 | consumed tokens: 109471334400 | elapsed time per iteration (s): 0.68 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.564631E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.745 | TFLOPs: 22.67 | 31: iteration 208900/ 476837 | consumed samples: 53478400 | consumed tokens: 109523763200 | elapsed time per iteration (s): 0.70 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.559145E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.089 | TFLOPs: 22.03 | 31: iteration 209000/ 476837 | consumed samples: 53504000 | consumed tokens: 109576192000 | elapsed time per iteration (s): 0.68 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.564926E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.706 | TFLOPs: 22.79 | 31: iteration 209100/ 476837 | consumed samples: 53529600 | consumed tokens: 109628620800 | elapsed time per iteration (s): 0.68 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.562614E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.699 | TFLOPs: 22.79 | 31: iteration 209200/ 476837 | consumed samples: 53555200 | consumed tokens: 109681049600 | elapsed time per iteration (s): 0.68 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.562141E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.558 | TFLOPs: 22.78 | 31: iteration 209300/ 476837 | consumed samples: 53580800 | consumed tokens: 109733478400 | elapsed time per iteration (s): 0.69 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.561263E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.797 | TFLOPs: 22.49 | 31: iteration 209400/ 476837 | consumed samples: 53606400 | consumed tokens: 109785907200 | elapsed time per iteration (s): 0.68 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.563041E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.062 | TFLOPs: 22.75 | 31: iteration 209500/ 476837 | consumed samples: 53632000 | consumed tokens: 109838336000 | elapsed time per iteration (s): 0.73 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.559351E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 351.397 | TFLOPs: 21.26 | 31: iteration 209600/ 476837 | consumed samples: 53657600 | consumed tokens: 109890764800 | elapsed time per iteration (s): 0.70 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.561334E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.021 | TFLOPs: 22.02 | 31: iteration 209700/ 476837 | consumed samples: 53683200 | consumed tokens: 109943193600 | elapsed time per iteration (s): 0.68 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.566558E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.729 | TFLOPs: 22.79 | 31: iteration 209800/ 476837 | consumed samples: 53708800 | consumed tokens: 109995622400 | elapsed time per iteration (s): 0.69 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.563460E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.602 | TFLOPs: 22.42 | 31: iteration 209900/ 476837 | consumed samples: 53734400 | consumed tokens: 110048051200 | elapsed time per iteration (s): 0.68 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.562497E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.069 | TFLOPs: 22.75 | 0: [2023-04-27 13:22:11,496] [INFO] [logging.py:68:log_dist] [Rank 0] step=210000, skipped=0, lr=[0.00012832023064321605, 0.00012832023064321605, 0.00012832023064321605], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 210000/ 476837 | consumed samples: 53760000 | consumed tokens: 110100480000 | elapsed time per iteration (s): 0.70 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.564448E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.280 | TFLOPs: 22.22 | 0: steps: 210000 loss: 2.5070 iter time (s): 0.686 samples/sec: 372.997 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 210000 | lm loss value: 2.958612E+00 | lm loss PPL: 1.927120E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 210100/ 476837 | consumed samples: 53785600 | consumed tokens: 110152908800 | elapsed time per iteration (s): 0.68 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.562719E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.130 | TFLOPs: 22.75 | 31: iteration 210200/ 476837 | consumed samples: 53811200 | consumed tokens: 110205337600 | elapsed time per iteration (s): 0.71 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.557997E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.142 | TFLOPs: 21.79 | 31: iteration 210300/ 476837 | consumed samples: 53836800 | consumed tokens: 110257766400 | elapsed time per iteration (s): 0.75 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.563976E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.025 | TFLOPs: 20.75 | 31: iteration 210400/ 476837 | consumed samples: 53862400 | consumed tokens: 110310195200 | elapsed time per iteration (s): 0.68 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.562835E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.714 | TFLOPs: 22.79 | 31: iteration 210500/ 476837 | consumed samples: 53888000 | consumed tokens: 110362624000 | elapsed time per iteration (s): 0.68 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.562282E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.957 | TFLOPs: 22.80 | 31: iteration 210600/ 476837 | consumed samples: 53913600 | consumed tokens: 110415052800 | elapsed time per iteration (s): 0.68 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.557001E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.180 | TFLOPs: 22.82 | 31: iteration 210700/ 476837 | consumed samples: 53939200 | consumed tokens: 110467481600 | elapsed time per iteration (s): 0.68 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.559119E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.166 | TFLOPs: 22.82 | 31: iteration 210800/ 476837 | consumed samples: 53964800 | consumed tokens: 110519910400 | elapsed time per iteration (s): 0.68 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.560270E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.753 | TFLOPs: 22.79 | 31: iteration 210900/ 476837 | consumed samples: 53990400 | consumed tokens: 110572339200 | elapsed time per iteration (s): 0.68 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.559727E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.963 | TFLOPs: 22.81 | 31: iteration 211000/ 476837 | consumed samples: 54016000 | consumed tokens: 110624768000 | elapsed time per iteration (s): 0.72 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.555139E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.090 | TFLOPs: 21.48 | 31: iteration 211100/ 476837 | consumed samples: 54041600 | consumed tokens: 110677196800 | elapsed time per iteration (s): 0.68 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.561586E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.928 | TFLOPs: 22.74 | 31: iteration 211200/ 476837 | consumed samples: 54067200 | consumed tokens: 110729625600 | elapsed time per iteration (s): 0.69 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.558523E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.329 | TFLOPs: 22.59 | 31: iteration 211300/ 476837 | consumed samples: 54092800 | consumed tokens: 110782054400 | elapsed time per iteration (s): 0.71 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.559882E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.828 | TFLOPs: 21.95 | 31: iteration 211400/ 476837 | consumed samples: 54118400 | consumed tokens: 110834483200 | elapsed time per iteration (s): 0.71 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.562867E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.613 | TFLOPs: 21.82 | 31: iteration 211500/ 476837 | consumed samples: 54144000 | consumed tokens: 110886912000 | elapsed time per iteration (s): 0.74 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.564100E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.277 | TFLOPs: 20.95 | 31: iteration 211600/ 476837 | consumed samples: 54169600 | consumed tokens: 110939340800 | elapsed time per iteration (s): 0.68 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.560405E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.570 | TFLOPs: 22.72 | 31: iteration 211700/ 476837 | consumed samples: 54195200 | consumed tokens: 110991769600 | elapsed time per iteration (s): 0.71 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.563873E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.703 | TFLOPs: 21.88 | 31: iteration 211800/ 476837 | consumed samples: 54220800 | consumed tokens: 111044198400 | elapsed time per iteration (s): 0.68 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.557799E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.470 | TFLOPs: 22.78 | 31: iteration 211900/ 476837 | consumed samples: 54246400 | consumed tokens: 111096627200 | elapsed time per iteration (s): 0.68 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.561015E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.817 | TFLOPs: 22.80 | 0: [2023-04-27 13:45:26,733] [INFO] [logging.py:68:log_dist] [Rank 0] step=212000, skipped=0, lr=[0.00012714583252939276, 0.00012714583252939276, 0.00012714583252939276], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 212000/ 476837 | consumed samples: 54272000 | consumed tokens: 111149056000 | elapsed time per iteration (s): 0.75 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.560416E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 341.931 | TFLOPs: 20.69 | 0: steps: 212000 loss: 2.5455 iter time (s): 0.694 samples/sec: 368.892 31: iteration 212100/ 476837 | consumed samples: 54297600 | consumed tokens: 111201484800 | elapsed time per iteration (s): 0.68 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.562953E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.746 | TFLOPs: 22.79 | 31: iteration 212200/ 476837 | consumed samples: 54323200 | consumed tokens: 111253913600 | elapsed time per iteration (s): 0.69 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.562103E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.829 | TFLOPs: 22.31 | 31: iteration 212300/ 476837 | consumed samples: 54348800 | consumed tokens: 111306342400 | elapsed time per iteration (s): 0.68 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.563719E+00 | grad norm: 0.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.205 | TFLOPs: 22.70 | 31: iteration 212400/ 476837 | consumed samples: 54374400 | consumed tokens: 111358771200 | elapsed time per iteration (s): 0.68 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.581973E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.932 | TFLOPs: 22.80 | 31: iteration 212500/ 476837 | consumed samples: 54400000 | consumed tokens: 111411200000 | elapsed time per iteration (s): 0.68 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.565529E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.305 | TFLOPs: 22.77 | 31: iteration 212600/ 476837 | consumed samples: 54425600 | consumed tokens: 111463628800 | elapsed time per iteration (s): 0.68 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.554341E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.897 | TFLOPs: 22.80 | 31: iteration 212700/ 476837 | consumed samples: 54451200 | consumed tokens: 111516057600 | elapsed time per iteration (s): 0.70 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.560557E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.961 | TFLOPs: 22.08 | 31: iteration 212800/ 476837 | consumed samples: 54476800 | consumed tokens: 111568486400 | elapsed time per iteration (s): 0.68 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.561891E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.070 | TFLOPs: 22.81 | 31: iteration 212900/ 476837 | consumed samples: 54502400 | consumed tokens: 111620915200 | elapsed time per iteration (s): 0.68 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.562360E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.030 | TFLOPs: 22.81 | 31: iteration 213000/ 476837 | consumed samples: 54528000 | consumed tokens: 111673344000 | elapsed time per iteration (s): 0.77 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.557746E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 333.458 | TFLOPs: 20.17 | 31: iteration 213100/ 476837 | consumed samples: 54553600 | consumed tokens: 111725772800 | elapsed time per iteration (s): 0.68 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.561114E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.554 | TFLOPs: 22.66 | 31: iteration 213200/ 476837 | consumed samples: 54579200 | consumed tokens: 111778201600 | elapsed time per iteration (s): 0.69 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.564521E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.615 | TFLOPs: 22.42 | 31: iteration 213300/ 476837 | consumed samples: 54604800 | consumed tokens: 111830630400 | elapsed time per iteration (s): 0.75 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.556702E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 340.461 | TFLOPs: 20.60 | 31: iteration 213400/ 476837 | consumed samples: 54630400 | consumed tokens: 111883059200 | elapsed time per iteration (s): 0.68 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.558709E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.074 | TFLOPs: 22.81 | 31: iteration 213500/ 476837 | consumed samples: 54656000 | consumed tokens: 111935488000 | elapsed time per iteration (s): 0.74 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.554947E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 345.353 | TFLOPs: 20.89 | 31: iteration 213600/ 476837 | consumed samples: 54681600 | consumed tokens: 111987916800 | elapsed time per iteration (s): 0.69 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.556800E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.529 | TFLOPs: 22.48 | 31: iteration 213700/ 476837 | consumed samples: 54707200 | consumed tokens: 112040345600 | elapsed time per iteration (s): 0.68 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.557687E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.144 | TFLOPs: 22.82 | 31: iteration 213800/ 476837 | consumed samples: 54732800 | consumed tokens: 112092774400 | elapsed time per iteration (s): 0.82 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.554347E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 311.840 | TFLOPs: 18.87 | 31: iteration 213900/ 476837 | consumed samples: 54758400 | consumed tokens: 112145203200 | elapsed time per iteration (s): 0.69 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.559407E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.573 | TFLOPs: 22.60 | 0: [2023-04-27 14:08:51,350] [INFO] [logging.py:68:log_dist] [Rank 0] step=214000, skipped=0, lr=[0.00012596839702019673, 0.00012596839702019673, 0.00012596839702019673], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 214000/ 476837 | consumed samples: 54784000 | consumed tokens: 112197632000 | elapsed time per iteration (s): 0.70 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.554173E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 363.559 | TFLOPs: 21.99 | 0: steps: 214000 loss: 2.5339 iter time (s): 0.699 samples/sec: 366.214 31: iteration 214100/ 476837 | consumed samples: 54809600 | consumed tokens: 112250060800 | elapsed time per iteration (s): 0.68 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.560313E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.564 | TFLOPs: 22.66 | 31: iteration 214200/ 476837 | consumed samples: 54835200 | consumed tokens: 112302489600 | elapsed time per iteration (s): 0.68 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.557115E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.905 | TFLOPs: 22.80 | 31: iteration 214300/ 476837 | consumed samples: 54860800 | consumed tokens: 112354918400 | elapsed time per iteration (s): 0.68 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.559063E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.363 | TFLOPs: 22.77 | 31: iteration 214400/ 476837 | consumed samples: 54886400 | consumed tokens: 112407347200 | elapsed time per iteration (s): 0.68 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.563183E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.253 | TFLOPs: 22.76 | 31: iteration 214500/ 476837 | consumed samples: 54912000 | consumed tokens: 112459776000 | elapsed time per iteration (s): 0.68 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.560048E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.018 | TFLOPs: 22.81 | 31: iteration 214600/ 476837 | consumed samples: 54937600 | consumed tokens: 112512204800 | elapsed time per iteration (s): 0.68 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.558911E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.458 | TFLOPs: 22.77 | 31: iteration 214700/ 476837 | consumed samples: 54963200 | consumed tokens: 112564633600 | elapsed time per iteration (s): 0.68 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.555716E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.140 | TFLOPs: 22.82 | 31: iteration 214800/ 476837 | consumed samples: 54988800 | consumed tokens: 112617062400 | elapsed time per iteration (s): 0.68 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.560570E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.146 | TFLOPs: 22.82 | 31: iteration 214900/ 476837 | consumed samples: 55014400 | consumed tokens: 112669491200 | elapsed time per iteration (s): 0.74 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.559666E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 347.751 | TFLOPs: 21.04 | 31: iteration 215000/ 476837 | consumed samples: 55040000 | consumed tokens: 112721920000 | elapsed time per iteration (s): 0.68 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.558284E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.449 | TFLOPs: 22.77 | 31: iteration 215100/ 476837 | consumed samples: 55065600 | consumed tokens: 112774348800 | elapsed time per iteration (s): 0.79 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.554091E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 325.848 | TFLOPs: 19.71 | 31: iteration 215200/ 476837 | consumed samples: 55091200 | consumed tokens: 112826777600 | elapsed time per iteration (s): 0.68 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.561437E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.240 | TFLOPs: 22.82 | 31: iteration 215300/ 476837 | consumed samples: 55116800 | consumed tokens: 112879206400 | elapsed time per iteration (s): 0.68 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.556102E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.341 | TFLOPs: 22.83 | 31: iteration 215400/ 476837 | consumed samples: 55142400 | consumed tokens: 112931635200 | elapsed time per iteration (s): 0.76 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.556445E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 336.054 | TFLOPs: 20.33 | 31: iteration 215500/ 476837 | consumed samples: 55168000 | consumed tokens: 112984064000 | elapsed time per iteration (s): 0.69 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.558114E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.829 | TFLOPs: 22.43 | 31: iteration 215600/ 476837 | consumed samples: 55193600 | consumed tokens: 113036492800 | elapsed time per iteration (s): 0.74 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.556450E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.371 | TFLOPs: 20.95 | 31: iteration 215700/ 476837 | consumed samples: 55219200 | consumed tokens: 113088921600 | elapsed time per iteration (s): 0.68 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.560131E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.757 | TFLOPs: 22.79 | 31: iteration 215800/ 476837 | consumed samples: 55244800 | consumed tokens: 113141350400 | elapsed time per iteration (s): 0.68 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.558131E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.053 | TFLOPs: 22.81 | 31: iteration 215900/ 476837 | consumed samples: 55270400 | consumed tokens: 113193779200 | elapsed time per iteration (s): 0.73 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.558946E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.639 | TFLOPs: 21.33 | 0: [2023-04-27 14:32:06,689] [INFO] [logging.py:68:log_dist] [Rank 0] step=216000, skipped=0, lr=[0.0001247881326990935, 0.0001247881326990935, 0.0001247881326990935], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 216000/ 476837 | consumed samples: 55296000 | consumed tokens: 113246208000 | elapsed time per iteration (s): 0.68 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.557929E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.036 | TFLOPs: 22.81 | 0: steps: 216000 loss: 2.5522 iter time (s): 0.694 samples/sec: 368.676 31: iteration 216100/ 476837 | consumed samples: 55321600 | consumed tokens: 113298636800 | elapsed time per iteration (s): 0.68 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.556239E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.902 | TFLOPs: 22.80 | 31: iteration 216200/ 476837 | consumed samples: 55347200 | consumed tokens: 113351065600 | elapsed time per iteration (s): 0.68 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.556576E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.953 | TFLOPs: 22.80 | 31: iteration 216300/ 476837 | consumed samples: 55372800 | consumed tokens: 113403494400 | elapsed time per iteration (s): 0.70 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.556951E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.491 | TFLOPs: 22.17 | 31: iteration 216400/ 476837 | consumed samples: 55398400 | consumed tokens: 113455923200 | elapsed time per iteration (s): 0.73 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.558876E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.912 | TFLOPs: 21.17 | 31: iteration 216500/ 476837 | consumed samples: 55424000 | consumed tokens: 113508352000 | elapsed time per iteration (s): 0.68 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.559198E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.903 | TFLOPs: 22.80 | 31: iteration 216600/ 476837 | consumed samples: 55449600 | consumed tokens: 113560780800 | elapsed time per iteration (s): 0.68 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.560432E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.898 | TFLOPs: 22.80 | 31: iteration 216700/ 476837 | consumed samples: 55475200 | consumed tokens: 113613209600 | elapsed time per iteration (s): 0.68 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.557716E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.791 | TFLOPs: 22.73 | 31: iteration 216800/ 476837 | consumed samples: 55500800 | consumed tokens: 113665638400 | elapsed time per iteration (s): 0.69 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.558394E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.618 | TFLOPs: 22.60 | 31: iteration 216900/ 476837 | consumed samples: 55526400 | consumed tokens: 113718067200 | elapsed time per iteration (s): 0.69 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.554942E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.759 | TFLOPs: 22.55 | 31: iteration 217000/ 476837 | consumed samples: 55552000 | consumed tokens: 113770496000 | elapsed time per iteration (s): 0.68 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.553888E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.874 | TFLOPs: 22.80 | 31: iteration 217100/ 476837 | consumed samples: 55577600 | consumed tokens: 113822924800 | elapsed time per iteration (s): 0.68 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.558960E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.305 | TFLOPs: 22.64 | 31: iteration 217200/ 476837 | consumed samples: 55603200 | consumed tokens: 113875353600 | elapsed time per iteration (s): 0.68 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.550374E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.588 | TFLOPs: 22.78 | 31: iteration 217300/ 476837 | consumed samples: 55628800 | consumed tokens: 113927782400 | elapsed time per iteration (s): 0.68 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.554641E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.916 | TFLOPs: 22.80 | 31: iteration 217400/ 476837 | consumed samples: 55654400 | consumed tokens: 113980211200 | elapsed time per iteration (s): 0.68 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.552712E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.899 | TFLOPs: 22.80 | 31: iteration 217500/ 476837 | consumed samples: 55680000 | consumed tokens: 114032640000 | elapsed time per iteration (s): 0.68 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.550998E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.705 | TFLOPs: 22.79 | 31: iteration 217600/ 476837 | consumed samples: 55705600 | consumed tokens: 114085068800 | elapsed time per iteration (s): 0.75 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.555292E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 341.298 | TFLOPs: 20.65 | 31: iteration 217700/ 476837 | consumed samples: 55731200 | consumed tokens: 114137497600 | elapsed time per iteration (s): 0.68 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.551547E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.933 | TFLOPs: 22.80 | 31: iteration 217800/ 476837 | consumed samples: 55756800 | consumed tokens: 114189926400 | elapsed time per iteration (s): 0.71 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.557502E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.506 | TFLOPs: 21.93 | 31: iteration 217900/ 476837 | consumed samples: 55782400 | consumed tokens: 114242355200 | elapsed time per iteration (s): 0.68 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.555723E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.782 | TFLOPs: 22.79 | 0: [2023-04-27 14:55:04,225] [INFO] [logging.py:68:log_dist] [Rank 0] step=218000, skipped=0, lr=[0.0001236052486506743, 0.0001236052486506743, 0.0001236052486506743], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 218000/ 476837 | consumed samples: 55808000 | consumed tokens: 114294784000 | elapsed time per iteration (s): 0.68 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.553831E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.800 | TFLOPs: 22.80 | 0: steps: 218000 loss: 2.5104 iter time (s): 0.685 samples/sec: 373.622 31: iteration 218100/ 476837 | consumed samples: 55833600 | consumed tokens: 114347212800 | elapsed time per iteration (s): 0.68 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.553238E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.980 | TFLOPs: 22.75 | 31: iteration 218200/ 476837 | consumed samples: 55859200 | consumed tokens: 114399641600 | elapsed time per iteration (s): 0.68 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.557342E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.691 | TFLOPs: 22.79 | 31: iteration 218300/ 476837 | consumed samples: 55884800 | consumed tokens: 114452070400 | elapsed time per iteration (s): 0.68 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.557658E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.262 | TFLOPs: 22.70 | 31: iteration 218400/ 476837 | consumed samples: 55910400 | consumed tokens: 114504499200 | elapsed time per iteration (s): 0.70 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.557435E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.736 | TFLOPs: 22.25 | 31: iteration 218500/ 476837 | consumed samples: 55936000 | consumed tokens: 114556928000 | elapsed time per iteration (s): 0.68 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.555092E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.909 | TFLOPs: 22.80 | 31: iteration 218600/ 476837 | consumed samples: 55961600 | consumed tokens: 114609356800 | elapsed time per iteration (s): 0.68 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.559062E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.943 | TFLOPs: 22.74 | 31: iteration 218700/ 476837 | consumed samples: 55987200 | consumed tokens: 114661785600 | elapsed time per iteration (s): 0.72 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.560338E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.165 | TFLOPs: 21.61 | 31: iteration 218800/ 476837 | consumed samples: 56012800 | consumed tokens: 114714214400 | elapsed time per iteration (s): 0.71 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.551050E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.718 | TFLOPs: 21.76 | 31: iteration 218900/ 476837 | consumed samples: 56038400 | consumed tokens: 114766643200 | elapsed time per iteration (s): 0.68 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.554403E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.799 | TFLOPs: 22.80 | 31: iteration 219000/ 476837 | consumed samples: 56064000 | consumed tokens: 114819072000 | elapsed time per iteration (s): 0.68 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.556402E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.832 | TFLOPs: 22.80 | 31: iteration 219100/ 476837 | consumed samples: 56089600 | consumed tokens: 114871500800 | elapsed time per iteration (s): 0.68 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.553291E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.432 | TFLOPs: 22.65 | 31: iteration 219200/ 476837 | consumed samples: 56115200 | consumed tokens: 114923929600 | elapsed time per iteration (s): 0.68 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.561340E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.231 | TFLOPs: 22.76 | 31: iteration 219300/ 476837 | consumed samples: 56140800 | consumed tokens: 114976358400 | elapsed time per iteration (s): 0.68 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.555792E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.776 | TFLOPs: 22.79 | 31: iteration 219400/ 476837 | consumed samples: 56166400 | consumed tokens: 115028787200 | elapsed time per iteration (s): 0.72 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.548763E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 353.963 | TFLOPs: 21.41 | 31: iteration 219500/ 476837 | consumed samples: 56192000 | consumed tokens: 115081216000 | elapsed time per iteration (s): 0.68 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.548991E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.282 | TFLOPs: 22.64 | 31: iteration 219600/ 476837 | consumed samples: 56217600 | consumed tokens: 115133644800 | elapsed time per iteration (s): 0.69 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.554833E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.709 | TFLOPs: 22.37 | 31: iteration 219700/ 476837 | consumed samples: 56243200 | consumed tokens: 115186073600 | elapsed time per iteration (s): 0.76 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.555227E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 335.234 | TFLOPs: 20.28 | 31: iteration 219800/ 476837 | consumed samples: 56268800 | consumed tokens: 115238502400 | elapsed time per iteration (s): 0.68 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.553063E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.858 | TFLOPs: 22.80 | 31: iteration 219900/ 476837 | consumed samples: 56294400 | consumed tokens: 115290931200 | elapsed time per iteration (s): 0.68 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.550723E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.528 | TFLOPs: 22.78 | 0: [2023-04-27 15:18:07,358] [INFO] [logging.py:68:log_dist] [Rank 0] step=220000, skipped=0, lr=[0.00012241995442361696, 0.00012241995442361696, 0.00012241995442361696], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 220000/ 476837 | consumed samples: 56320000 | consumed tokens: 115343360000 | elapsed time per iteration (s): 0.68 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.550207E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.982 | TFLOPs: 22.81 | 0: steps: 220000 loss: 2.5216 iter time (s): 0.688 samples/sec: 371.952 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 220000 | lm loss value: 2.931572E+00 | lm loss PPL: 1.875709E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 220000 to checkpoints_1b1250b1b5 0: [2023-04-27 15:18:09,992] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step220000 is begin to save! 0: [2023-04-27 15:18:10,047] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_01-model_00-model_states.pt... 0: [2023-04-27 15:18:10,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_01-model_00-model_states.pt. 0: [2023-04-27 15:18:10,791] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_03-model_00-model_states.pt... 0: [2023-04-27 15:18:10,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_03-model_00-model_states.pt. 0: [2023-04-27 15:18:10,882] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_04-model_00-model_states.pt... 0: [2023-04-27 15:18:10,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_04-model_00-model_states.pt. 0: [2023-04-27 15:18:10,974] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_05-model_00-model_states.pt... 0: [2023-04-27 15:18:11,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_05-model_00-model_states.pt. 0: [2023-04-27 15:18:11,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_06-model_00-model_states.pt... 0: [2023-04-27 15:18:11,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_06-model_00-model_states.pt. 0: [2023-04-27 15:18:11,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_07-model_00-model_states.pt... 0: [2023-04-27 15:18:11,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_07-model_00-model_states.pt. 0: [2023-04-27 15:18:11,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_08-model_00-model_states.pt... 0: [2023-04-27 15:18:11,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_08-model_00-model_states.pt. 0: [2023-04-27 15:18:11,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_09-model_00-model_states.pt... 0: [2023-04-27 15:18:11,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_09-model_00-model_states.pt. 0: [2023-04-27 15:18:11,450] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_10-model_00-model_states.pt... 0: [2023-04-27 15:18:11,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_10-model_00-model_states.pt. 0: [2023-04-27 15:18:11,541] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_11-model_00-model_states.pt... 0: [2023-04-27 15:18:11,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_11-model_00-model_states.pt. 0: [2023-04-27 15:18:11,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_12-model_00-model_states.pt... 0: [2023-04-27 15:18:11,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_12-model_00-model_states.pt. 0: [2023-04-27 15:18:11,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_13-model_00-model_states.pt... 0: [2023-04-27 15:18:11,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_13-model_00-model_states.pt. 0: [2023-04-27 15:18:11,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_14-model_00-model_states.pt... 0: [2023-04-27 15:18:11,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_14-model_00-model_states.pt. 0: [2023-04-27 15:18:11,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_15-model_00-model_states.pt... 0: [2023-04-27 15:18:11,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_15-model_00-model_states.pt. 0: [2023-04-27 15:18:11,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_16-model_00-model_states.pt... 0: [2023-04-27 15:18:12,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_16-model_00-model_states.pt. 0: [2023-04-27 15:18:12,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_17-model_00-model_states.pt... 0: [2023-04-27 15:18:12,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_17-model_00-model_states.pt. 0: [2023-04-27 15:18:12,170] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_18-model_00-model_states.pt... 0: [2023-04-27 15:18:12,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_18-model_00-model_states.pt. 0: [2023-04-27 15:18:12,268] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_19-model_00-model_states.pt... 0: [2023-04-27 15:18:12,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_19-model_00-model_states.pt. 0: [2023-04-27 15:18:12,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_20-model_00-model_states.pt... 0: [2023-04-27 15:18:12,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_20-model_00-model_states.pt. 0: [2023-04-27 15:18:12,519] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_21-model_00-model_states.pt... 0: [2023-04-27 15:18:12,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_21-model_00-model_states.pt. 0: [2023-04-27 15:18:12,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_22-model_00-model_states.pt... 0: [2023-04-27 15:18:12,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_22-model_00-model_states.pt. 0: [2023-04-27 15:18:12,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_23-model_00-model_states.pt... 0: [2023-04-27 15:18:12,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_23-model_00-model_states.pt. 0: [2023-04-27 15:18:12,805] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_24-model_00-model_states.pt... 0: [2023-04-27 15:18:12,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_24-model_00-model_states.pt. 0: [2023-04-27 15:18:12,894] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_25-model_00-model_states.pt... 0: [2023-04-27 15:18:12,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_25-model_00-model_states.pt. 0: [2023-04-27 15:18:12,988] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_26-model_00-model_states.pt... 0: [2023-04-27 15:18:13,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_26-model_00-model_states.pt. 0: [2023-04-27 15:18:13,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_27-model_00-model_states.pt... 0: [2023-04-27 15:18:13,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_27-model_00-model_states.pt. 0: [2023-04-27 15:18:13,166] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_28-model_00-model_states.pt... 0: [2023-04-27 15:18:13,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_28-model_00-model_states.pt. 0: [2023-04-27 15:18:13,255] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/layer_30-model_00-model_states.pt... 0: [2023-04-27 15:18:13,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/layer_30-model_00-model_states.pt. 0: [2023-04-27 15:18:13,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step220000/mp_rank_00_model_states.pt 0: [2023-04-27 15:18:13,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/mp_rank_00_model_states.pt... 0: [2023-04-27 15:18:13,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/mp_rank_00_model_states.pt. 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 9: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 17: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 29: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 30: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 14: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 15: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 12: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 19: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 27: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 28: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 4: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 21: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 25: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 8: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 26: [2023-04-27 15:18:13,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 0: [2023-04-27 15:18:13,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 15:18:13,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 4: [2023-04-27 15:18:13,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 16: [2023-04-27 15:18:13,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 4: [2023-04-27 15:18:13,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-27 15:18:13,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 24: [2023-04-27 15:18:13,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 15:18:13,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-27 15:18:13,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 16: [2023-04-27 15:18:13,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-27 15:18:13,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 8: [2023-04-27 15:18:13,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-27 15:18:13,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-27 15:18:13,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: [2023-04-27 15:18:13,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-27 15:18:13,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 10: [2023-04-27 15:18:13,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-27 15:18:13,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-27 15:18:13,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 14: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 25: [2023-04-27 15:18:13,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 27: [2023-04-27 15:18:13,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 19: [2023-04-27 15:18:13,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 17: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 12: [2023-04-27 15:18:13,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 18: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 6: [2023-04-27 15:18:13,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 15:18:13,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-27 15:18:13,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 11: [2023-04-27 15:18:13,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 3: [2023-04-27 15:18:13,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-27 15:18:13,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-27 15:18:13,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 7: [2023-04-27 15:18:13,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-27 15:18:13,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-27 15:18:13,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 26: [2023-04-27 15:18:13,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 15:18:13,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-27 15:18:13,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 2: [2023-04-27 15:18:13,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 13: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 31: [2023-04-27 15:18:13,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 5: [2023-04-27 15:18:13,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 29: [2023-04-27 15:18:13,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 15:18:13,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-27 15:18:13,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 28: [2023-04-27 15:18:13,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-27 15:18:13,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-27 15:18:13,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 30: [2023-04-27 15:18:13,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-27 15:18:13,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-27 15:18:13,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 23: [2023-04-27 15:18:13,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:13,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 15: [2023-04-27 15:18:13,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:13,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 20: [2023-04-27 15:18:13,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-27 15:18:13,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-27 15:18:13,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 1: [2023-04-27 15:18:13,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 22: [2023-04-27 15:18:13,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 9: [2023-04-27 15:18:20,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-27 15:18:20,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-27 15:18:20,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 21: [2023-04-27 15:18:20,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 15:18:20,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step220000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-27 15:18:20,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step220000 is ready now! 0: successfully saved checkpoint at iteration 220000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 12968.93 31: iteration 220100/ 476837 | consumed samples: 56345600 | consumed tokens: 115395788800 | elapsed time per iteration (s): 0.81 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.555025E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 314.381 | TFLOPs: 19.02 | 31: iteration 220200/ 476837 | consumed samples: 56371200 | consumed tokens: 115448217600 | elapsed time per iteration (s): 0.70 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.552434E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.924 | TFLOPs: 22.20 | 31: iteration 220300/ 476837 | consumed samples: 56396800 | consumed tokens: 115500646400 | elapsed time per iteration (s): 0.68 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.555439E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.547 | TFLOPs: 22.78 | 31: iteration 220400/ 476837 | consumed samples: 56422400 | consumed tokens: 115553075200 | elapsed time per iteration (s): 0.68 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.552592E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.823 | TFLOPs: 22.80 | 31: iteration 220500/ 476837 | consumed samples: 56448000 | consumed tokens: 115605504000 | elapsed time per iteration (s): 0.69 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.551521E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.912 | TFLOPs: 22.38 | 31: iteration 220600/ 476837 | consumed samples: 56473600 | consumed tokens: 115657932800 | elapsed time per iteration (s): 0.68 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.552670E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.241 | TFLOPs: 22.76 | 31: iteration 220700/ 476837 | consumed samples: 56499200 | consumed tokens: 115710361600 | elapsed time per iteration (s): 0.69 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.551692E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.576 | TFLOPs: 22.48 | 31: iteration 220800/ 476837 | consumed samples: 56524800 | consumed tokens: 115762790400 | elapsed time per iteration (s): 0.68 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.554308E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.506 | TFLOPs: 22.66 | 31: iteration 220900/ 476837 | consumed samples: 56550400 | consumed tokens: 115815219200 | elapsed time per iteration (s): 0.68 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.555645E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.786 | TFLOPs: 22.73 | 31: iteration 221000/ 476837 | consumed samples: 56576000 | consumed tokens: 115867648000 | elapsed time per iteration (s): 0.68 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.553083E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.352 | TFLOPs: 22.71 | 31: iteration 221100/ 476837 | consumed samples: 56601600 | consumed tokens: 115920076800 | elapsed time per iteration (s): 0.68 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.557512E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.120 | TFLOPs: 22.75 | 31: iteration 221200/ 476837 | consumed samples: 56627200 | consumed tokens: 115972505600 | elapsed time per iteration (s): 0.68 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.549625E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.506 | TFLOPs: 22.72 | 31: iteration 221300/ 476837 | consumed samples: 56652800 | consumed tokens: 116024934400 | elapsed time per iteration (s): 0.68 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.549062E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.416 | TFLOPs: 22.77 | 31: iteration 221400/ 476837 | consumed samples: 56678400 | consumed tokens: 116077363200 | elapsed time per iteration (s): 0.74 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.551223E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 344.200 | TFLOPs: 20.82 | 31: iteration 221500/ 476837 | consumed samples: 56704000 | consumed tokens: 116129792000 | elapsed time per iteration (s): 0.68 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.552493E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.982 | TFLOPs: 22.81 | 31: iteration 221600/ 476837 | consumed samples: 56729600 | consumed tokens: 116182220800 | elapsed time per iteration (s): 0.68 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.549025E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.855 | TFLOPs: 22.74 | 31: iteration 221700/ 476837 | consumed samples: 56755200 | consumed tokens: 116234649600 | elapsed time per iteration (s): 0.68 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.544889E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.751 | TFLOPs: 22.79 | 31: iteration 221800/ 476837 | consumed samples: 56780800 | consumed tokens: 116287078400 | elapsed time per iteration (s): 0.77 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.554875E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 331.685 | TFLOPs: 20.07 | 31: iteration 221900/ 476837 | consumed samples: 56806400 | consumed tokens: 116339507200 | elapsed time per iteration (s): 0.68 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.553386E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.751 | TFLOPs: 22.79 | 0: [2023-04-27 15:41:20,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=222000, skipped=0, lr=[0.00012123245999356385, 0.00012123245999356385, 0.00012123245999356385], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 222000/ 476837 | consumed samples: 56832000 | consumed tokens: 116391936000 | elapsed time per iteration (s): 0.68 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.549688E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.773 | TFLOPs: 22.79 | 0: steps: 222000 loss: 2.5068 iter time (s): 0.687 samples/sec: 372.743 31: iteration 222100/ 476837 | consumed samples: 56857600 | consumed tokens: 116444364800 | elapsed time per iteration (s): 0.68 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.550494E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.708 | TFLOPs: 22.79 | 31: iteration 222200/ 476837 | consumed samples: 56883200 | consumed tokens: 116496793600 | elapsed time per iteration (s): 0.68 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.546583E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.382 | TFLOPs: 22.77 | 31: iteration 222300/ 476837 | consumed samples: 56908800 | consumed tokens: 116549222400 | elapsed time per iteration (s): 0.68 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.550567E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.811 | TFLOPs: 22.80 | 31: iteration 222400/ 476837 | consumed samples: 56934400 | consumed tokens: 116601651200 | elapsed time per iteration (s): 0.68 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.547504E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.647 | TFLOPs: 22.67 | 31: iteration 222500/ 476837 | consumed samples: 56960000 | consumed tokens: 116654080000 | elapsed time per iteration (s): 0.68 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.548329E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.578 | TFLOPs: 22.72 | 31: iteration 222600/ 476837 | consumed samples: 56985600 | consumed tokens: 116706508800 | elapsed time per iteration (s): 0.71 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.552029E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.012 | TFLOPs: 21.90 | 31: iteration 222700/ 476837 | consumed samples: 57011200 | consumed tokens: 116758937600 | elapsed time per iteration (s): 0.68 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.550279E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.436 | TFLOPs: 22.71 | 31: iteration 222800/ 476837 | consumed samples: 57036800 | consumed tokens: 116811366400 | elapsed time per iteration (s): 0.68 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.551869E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.389 | TFLOPs: 22.71 | 31: iteration 222900/ 476837 | consumed samples: 57062400 | consumed tokens: 116863795200 | elapsed time per iteration (s): 0.74 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.548259E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.814 | TFLOPs: 20.98 | 31: iteration 223000/ 476837 | consumed samples: 57088000 | consumed tokens: 116916224000 | elapsed time per iteration (s): 0.68 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.545956E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.816 | TFLOPs: 22.80 | 31: iteration 223100/ 476837 | consumed samples: 57113600 | consumed tokens: 116968652800 | elapsed time per iteration (s): 0.68 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.550118E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.855 | TFLOPs: 22.80 | 31: iteration 223200/ 476837 | consumed samples: 57139200 | consumed tokens: 117021081600 | elapsed time per iteration (s): 0.68 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.554939E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.063 | TFLOPs: 22.63 | 31: iteration 223300/ 476837 | consumed samples: 57164800 | consumed tokens: 117073510400 | elapsed time per iteration (s): 0.68 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.552320E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.855 | TFLOPs: 22.80 | 31: iteration 223400/ 476837 | consumed samples: 57190400 | consumed tokens: 117125939200 | elapsed time per iteration (s): 0.68 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.548886E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.499 | TFLOPs: 22.78 | 31: iteration 223500/ 476837 | consumed samples: 57216000 | consumed tokens: 117178368000 | elapsed time per iteration (s): 0.68 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.551299E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.001 | TFLOPs: 22.81 | 31: iteration 223600/ 476837 | consumed samples: 57241600 | consumed tokens: 117230796800 | elapsed time per iteration (s): 0.68 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.545303E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.006 | TFLOPs: 22.81 | 31: iteration 223700/ 476837 | consumed samples: 57267200 | consumed tokens: 117283225600 | elapsed time per iteration (s): 0.68 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.554588E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.616 | TFLOPs: 22.78 | 31: iteration 223800/ 476837 | consumed samples: 57292800 | consumed tokens: 117335654400 | elapsed time per iteration (s): 0.68 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.546363E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.984 | TFLOPs: 22.81 | 31: iteration 223900/ 476837 | consumed samples: 57318400 | consumed tokens: 117388083200 | elapsed time per iteration (s): 0.76 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.550828E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 338.331 | TFLOPs: 20.47 | 0: [2023-04-27 16:04:17,810] [INFO] [logging.py:68:log_dist] [Rank 0] step=224000, skipped=0, lr=[0.00012004297572592486, 0.00012004297572592486, 0.00012004297572592486], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 224000/ 476837 | consumed samples: 57344000 | consumed tokens: 117440512000 | elapsed time per iteration (s): 0.68 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.546045E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.853 | TFLOPs: 22.80 | 0: steps: 224000 loss: 2.5773 iter time (s): 0.685 samples/sec: 373.746 31: iteration 224100/ 476837 | consumed samples: 57369600 | consumed tokens: 117492940800 | elapsed time per iteration (s): 0.68 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.546884E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.669 | TFLOPs: 22.79 | 31: iteration 224200/ 476837 | consumed samples: 57395200 | consumed tokens: 117545369600 | elapsed time per iteration (s): 0.68 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.552566E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.694 | TFLOPs: 22.79 | 31: iteration 224300/ 476837 | consumed samples: 57420800 | consumed tokens: 117597798400 | elapsed time per iteration (s): 0.68 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.549001E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.429 | TFLOPs: 22.71 | 31: iteration 224400/ 476837 | consumed samples: 57446400 | consumed tokens: 117650227200 | elapsed time per iteration (s): 0.72 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.552437E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 353.272 | TFLOPs: 21.37 | 31: iteration 224500/ 476837 | consumed samples: 57472000 | consumed tokens: 117702656000 | elapsed time per iteration (s): 0.68 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.548650E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.631 | TFLOPs: 22.79 | 31: iteration 224600/ 476837 | consumed samples: 57497600 | consumed tokens: 117755084800 | elapsed time per iteration (s): 0.74 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.553606E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 347.734 | TFLOPs: 21.04 | 31: iteration 224700/ 476837 | consumed samples: 57523200 | consumed tokens: 117807513600 | elapsed time per iteration (s): 0.68 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.550476E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.999 | TFLOPs: 22.75 | 31: iteration 224800/ 476837 | consumed samples: 57548800 | consumed tokens: 117859942400 | elapsed time per iteration (s): 0.69 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.548281E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.942 | TFLOPs: 22.56 | 31: iteration 224900/ 476837 | consumed samples: 57574400 | consumed tokens: 117912371200 | elapsed time per iteration (s): 0.73 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.554622E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.801 | TFLOPs: 21.16 | 31: iteration 225000/ 476837 | consumed samples: 57600000 | consumed tokens: 117964800000 | elapsed time per iteration (s): 0.69 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.544324E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.089 | TFLOPs: 22.57 | 31: iteration 225100/ 476837 | consumed samples: 57625600 | consumed tokens: 118017228800 | elapsed time per iteration (s): 0.70 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.640239E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.093 | TFLOPs: 22.27 | 31: iteration 225200/ 476837 | consumed samples: 57651200 | consumed tokens: 118069657600 | elapsed time per iteration (s): 0.68 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.552298E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.713 | TFLOPs: 22.67 | 31: iteration 225300/ 476837 | consumed samples: 57676800 | consumed tokens: 118122086400 | elapsed time per iteration (s): 0.69 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.546991E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.039 | TFLOPs: 22.51 | 31: iteration 225400/ 476837 | consumed samples: 57702400 | consumed tokens: 118174515200 | elapsed time per iteration (s): 0.68 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.548895E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.355 | TFLOPs: 22.77 | 31: iteration 225500/ 476837 | consumed samples: 57728000 | consumed tokens: 118226944000 | elapsed time per iteration (s): 0.78 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.549156E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 329.368 | TFLOPs: 19.93 | 31: iteration 225600/ 476837 | consumed samples: 57753600 | consumed tokens: 118279372800 | elapsed time per iteration (s): 0.68 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.544124E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.536 | TFLOPs: 22.78 | 31: iteration 225700/ 476837 | consumed samples: 57779200 | consumed tokens: 118331801600 | elapsed time per iteration (s): 0.69 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.547721E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.886 | TFLOPs: 22.38 | 31: iteration 225800/ 476837 | consumed samples: 57804800 | consumed tokens: 118384230400 | elapsed time per iteration (s): 0.68 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.543299E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.722 | TFLOPs: 22.79 | 31: iteration 225900/ 476837 | consumed samples: 57830400 | consumed tokens: 118436659200 | elapsed time per iteration (s): 0.68 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.550199E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.356 | TFLOPs: 22.77 | 0: [2023-04-27 16:27:35,115] [INFO] [logging.py:68:log_dist] [Rank 0] step=226000, skipped=0, lr=[0.00011885171233861079, 0.00011885171233861079, 0.00011885171233861079], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 226000/ 476837 | consumed samples: 57856000 | consumed tokens: 118489088000 | elapsed time per iteration (s): 0.75 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.544093E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 341.298 | TFLOPs: 20.65 | 0: steps: 226000 loss: 2.5795 iter time (s): 0.695 samples/sec: 368.136 31: iteration 226100/ 476837 | consumed samples: 57881600 | consumed tokens: 118541516800 | elapsed time per iteration (s): 0.71 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.548906E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.754 | TFLOPs: 21.95 | 31: iteration 226200/ 476837 | consumed samples: 57907200 | consumed tokens: 118593945600 | elapsed time per iteration (s): 0.78 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.549082E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 328.171 | TFLOPs: 19.85 | 31: iteration 226300/ 476837 | consumed samples: 57932800 | consumed tokens: 118646374400 | elapsed time per iteration (s): 0.68 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.551937E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.807 | TFLOPs: 22.74 | 31: iteration 226400/ 476837 | consumed samples: 57958400 | consumed tokens: 118698803200 | elapsed time per iteration (s): 0.69 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.552779E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.761 | TFLOPs: 22.55 | 31: iteration 226500/ 476837 | consumed samples: 57984000 | consumed tokens: 118751232000 | elapsed time per iteration (s): 0.68 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.550828E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.924 | TFLOPs: 22.80 | 31: iteration 226600/ 476837 | consumed samples: 58009600 | consumed tokens: 118803660800 | elapsed time per iteration (s): 0.68 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.548241E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.225 | TFLOPs: 22.76 | 31: iteration 226700/ 476837 | consumed samples: 58035200 | consumed tokens: 118856089600 | elapsed time per iteration (s): 0.68 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.548662E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.961 | TFLOPs: 22.74 | 31: iteration 226800/ 476837 | consumed samples: 58060800 | consumed tokens: 118908518400 | elapsed time per iteration (s): 0.68 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.545499E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.094 | TFLOPs: 22.69 | 31: iteration 226900/ 476837 | consumed samples: 58086400 | consumed tokens: 118960947200 | elapsed time per iteration (s): 0.68 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.549525E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.387 | TFLOPs: 22.77 | 31: iteration 227000/ 476837 | consumed samples: 58112000 | consumed tokens: 119013376000 | elapsed time per iteration (s): 0.68 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.550734E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.746 | TFLOPs: 22.79 | 31: iteration 227100/ 476837 | consumed samples: 58137600 | consumed tokens: 119065804800 | elapsed time per iteration (s): 0.68 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.545647E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.759 | TFLOPs: 22.79 | 31: iteration 227200/ 476837 | consumed samples: 58163200 | consumed tokens: 119118233600 | elapsed time per iteration (s): 0.68 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.549550E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.289 | TFLOPs: 22.70 | 31: iteration 227300/ 476837 | consumed samples: 58188800 | consumed tokens: 119170662400 | elapsed time per iteration (s): 0.68 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.540154E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.670 | TFLOPs: 22.79 | 31: iteration 227400/ 476837 | consumed samples: 58214400 | consumed tokens: 119223091200 | elapsed time per iteration (s): 0.70 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.547436E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 363.715 | TFLOPs: 22.00 | 31: iteration 227500/ 476837 | consumed samples: 58240000 | consumed tokens: 119275520000 | elapsed time per iteration (s): 0.68 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.553510E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.725 | TFLOPs: 22.79 | 31: iteration 227600/ 476837 | consumed samples: 58265600 | consumed tokens: 119327948800 | elapsed time per iteration (s): 0.68 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.547299E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.771 | TFLOPs: 22.79 | 31: iteration 227700/ 476837 | consumed samples: 58291200 | consumed tokens: 119380377600 | elapsed time per iteration (s): 0.68 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.544877E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.682 | TFLOPs: 22.79 | 31: iteration 227800/ 476837 | consumed samples: 58316800 | consumed tokens: 119432806400 | elapsed time per iteration (s): 0.68 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.549557E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.874 | TFLOPs: 22.80 | 31: iteration 227900/ 476837 | consumed samples: 58342400 | consumed tokens: 119485235200 | elapsed time per iteration (s): 0.68 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.553444E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.775 | TFLOPs: 22.79 | 0: [2023-04-27 16:50:31,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=228000, skipped=0, lr=[0.00011765888086470475, 0.00011765888086470475, 0.00011765888086470475], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 228000/ 476837 | consumed samples: 58368000 | consumed tokens: 119537664000 | elapsed time per iteration (s): 0.69 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.547309E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.093 | TFLOPs: 22.51 | 0: steps: 228000 loss: 2.5567 iter time (s): 0.685 samples/sec: 373.748 31: iteration 228100/ 476837 | consumed samples: 58393600 | consumed tokens: 119590092800 | elapsed time per iteration (s): 0.68 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.550130E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.882 | TFLOPs: 22.74 | 31: iteration 228200/ 476837 | consumed samples: 58419200 | consumed tokens: 119642521600 | elapsed time per iteration (s): 0.78 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.543498E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 328.129 | TFLOPs: 19.85 | 31: iteration 228300/ 476837 | consumed samples: 58444800 | consumed tokens: 119694950400 | elapsed time per iteration (s): 0.68 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.548478E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.852 | TFLOPs: 22.80 | 31: iteration 228400/ 476837 | consumed samples: 58470400 | consumed tokens: 119747379200 | elapsed time per iteration (s): 0.69 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.544787E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.102 | TFLOPs: 22.51 | 31: iteration 228500/ 476837 | consumed samples: 58496000 | consumed tokens: 119799808000 | elapsed time per iteration (s): 0.68 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.544724E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.495 | TFLOPs: 22.78 | 31: iteration 228600/ 476837 | consumed samples: 58521600 | consumed tokens: 119852236800 | elapsed time per iteration (s): 0.68 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.542191E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.782 | TFLOPs: 22.79 | 31: iteration 228700/ 476837 | consumed samples: 58547200 | consumed tokens: 119904665600 | elapsed time per iteration (s): 0.71 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.548598E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.414 | TFLOPs: 21.80 | 31: iteration 228800/ 476837 | consumed samples: 58572800 | consumed tokens: 119957094400 | elapsed time per iteration (s): 0.68 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.546292E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.715 | TFLOPs: 22.79 | 31: iteration 228900/ 476837 | consumed samples: 58598400 | consumed tokens: 120009523200 | elapsed time per iteration (s): 0.72 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.548686E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 356.503 | TFLOPs: 21.57 | 31: iteration 229000/ 476837 | consumed samples: 58624000 | consumed tokens: 120061952000 | elapsed time per iteration (s): 0.68 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.547223E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.711 | TFLOPs: 22.79 | 31: iteration 229100/ 476837 | consumed samples: 58649600 | consumed tokens: 120114380800 | elapsed time per iteration (s): 0.69 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.544365E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.716 | TFLOPs: 22.31 | 31: iteration 229200/ 476837 | consumed samples: 58675200 | consumed tokens: 120166809600 | elapsed time per iteration (s): 0.68 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.543353E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.753 | TFLOPs: 22.79 | 31: iteration 229300/ 476837 | consumed samples: 58700800 | consumed tokens: 120219238400 | elapsed time per iteration (s): 0.68 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.545083E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.746 | TFLOPs: 22.79 | 31: iteration 229400/ 476837 | consumed samples: 58726400 | consumed tokens: 120271667200 | elapsed time per iteration (s): 0.68 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.542650E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.108 | TFLOPs: 22.75 | 31: iteration 229500/ 476837 | consumed samples: 58752000 | consumed tokens: 120324096000 | elapsed time per iteration (s): 0.68 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.552086E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.862 | TFLOPs: 22.68 | 31: iteration 229600/ 476837 | consumed samples: 58777600 | consumed tokens: 120376524800 | elapsed time per iteration (s): 0.68 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.548410E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.721 | TFLOPs: 22.79 | 31: iteration 229700/ 476837 | consumed samples: 58803200 | consumed tokens: 120428953600 | elapsed time per iteration (s): 0.68 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.545033E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.731 | TFLOPs: 22.79 | 31: iteration 229800/ 476837 | consumed samples: 58828800 | consumed tokens: 120481382400 | elapsed time per iteration (s): 0.68 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.540880E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.317 | TFLOPs: 22.77 | 31: iteration 229900/ 476837 | consumed samples: 58854400 | consumed tokens: 120533811200 | elapsed time per iteration (s): 0.68 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.542066E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.913 | TFLOPs: 22.74 | 0: [2023-04-27 17:13:31,141] [INFO] [logging.py:68:log_dist] [Rank 0] step=230000, skipped=0, lr=[0.00011646469261507734, 0.00011646469261507734, 0.00011646469261507734], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 230000/ 476837 | consumed samples: 58880000 | consumed tokens: 120586240000 | elapsed time per iteration (s): 0.68 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.550876E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.079 | TFLOPs: 22.75 | 0: steps: 230000 loss: 2.5597 iter time (s): 0.686 samples/sec: 373.106 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 230000 | lm loss value: 2.940941E+00 | lm loss PPL: 1.893366E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 230100/ 476837 | consumed samples: 58905600 | consumed tokens: 120638668800 | elapsed time per iteration (s): 0.68 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.545551E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.114 | TFLOPs: 22.69 | 31: iteration 230200/ 476837 | consumed samples: 58931200 | consumed tokens: 120691097600 | elapsed time per iteration (s): 0.68 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.548903E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.576 | TFLOPs: 22.78 | 31: iteration 230300/ 476837 | consumed samples: 58956800 | consumed tokens: 120743526400 | elapsed time per iteration (s): 0.68 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.546413E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.676 | TFLOPs: 22.79 | 31: iteration 230400/ 476837 | consumed samples: 58982400 | consumed tokens: 120795955200 | elapsed time per iteration (s): 0.79 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.546655E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 325.318 | TFLOPs: 19.68 | 31: iteration 230500/ 476837 | consumed samples: 59008000 | consumed tokens: 120848384000 | elapsed time per iteration (s): 0.68 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.542779E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.752 | TFLOPs: 22.79 | 31: iteration 230600/ 476837 | consumed samples: 59033600 | consumed tokens: 120900812800 | elapsed time per iteration (s): 0.68 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.549843E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.034 | TFLOPs: 22.69 | 31: iteration 230700/ 476837 | consumed samples: 59059200 | consumed tokens: 120953241600 | elapsed time per iteration (s): 0.69 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.546398E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.620 | TFLOPs: 22.42 | 31: iteration 230800/ 476837 | consumed samples: 59084800 | consumed tokens: 121005670400 | elapsed time per iteration (s): 0.68 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.542747E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.016 | TFLOPs: 22.63 | 31: iteration 230900/ 476837 | consumed samples: 59110400 | consumed tokens: 121058099200 | elapsed time per iteration (s): 0.68 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.542352E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.833 | TFLOPs: 22.68 | 31: iteration 231000/ 476837 | consumed samples: 59136000 | consumed tokens: 121110528000 | elapsed time per iteration (s): 0.69 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.545983E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.186 | TFLOPs: 22.52 | 31: iteration 231100/ 476837 | consumed samples: 59161600 | consumed tokens: 121162956800 | elapsed time per iteration (s): 0.68 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.545322E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.688 | TFLOPs: 22.79 | 31: iteration 231200/ 476837 | consumed samples: 59187200 | consumed tokens: 121215385600 | elapsed time per iteration (s): 0.72 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.542777E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 354.690 | TFLOPs: 21.46 | 31: iteration 231300/ 476837 | consumed samples: 59212800 | consumed tokens: 121267814400 | elapsed time per iteration (s): 0.69 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.544876E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.595 | TFLOPs: 22.54 | 31: iteration 231400/ 476837 | consumed samples: 59238400 | consumed tokens: 121320243200 | elapsed time per iteration (s): 0.68 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.541216E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.345 | TFLOPs: 22.77 | 31: iteration 231500/ 476837 | consumed samples: 59264000 | consumed tokens: 121372672000 | elapsed time per iteration (s): 0.68 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.540125E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.709 | TFLOPs: 22.79 | 31: iteration 231600/ 476837 | consumed samples: 59289600 | consumed tokens: 121425100800 | elapsed time per iteration (s): 0.68 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.544543E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.002 | TFLOPs: 22.81 | 31: iteration 231700/ 476837 | consumed samples: 59315200 | consumed tokens: 121477529600 | elapsed time per iteration (s): 0.72 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.547103E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.785 | TFLOPs: 21.65 | 31: iteration 231800/ 476837 | consumed samples: 59340800 | consumed tokens: 121529958400 | elapsed time per iteration (s): 0.68 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.544235E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.004 | TFLOPs: 22.69 | 31: iteration 231900/ 476837 | consumed samples: 59366400 | consumed tokens: 121582387200 | elapsed time per iteration (s): 0.69 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.538444E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.785 | TFLOPs: 22.55 | 0: [2023-04-27 17:36:34,864] [INFO] [logging.py:68:log_dist] [Rank 0] step=232000, skipped=0, lr=[0.00011526935914095284, 0.00011526935914095284, 0.00011526935914095284], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 232000/ 476837 | consumed samples: 59392000 | consumed tokens: 121634816000 | elapsed time per iteration (s): 0.69 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.542667E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.016 | TFLOPs: 22.51 | 0: steps: 232000 loss: 2.5066 iter time (s): 0.688 samples/sec: 371.993 31: iteration 232100/ 476837 | consumed samples: 59417600 | consumed tokens: 121687244800 | elapsed time per iteration (s): 0.68 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.540273E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.745 | TFLOPs: 22.79 | 31: iteration 232200/ 476837 | consumed samples: 59443200 | consumed tokens: 121739673600 | elapsed time per iteration (s): 0.68 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.546690E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.804 | TFLOPs: 22.80 | 31: iteration 232300/ 476837 | consumed samples: 59468800 | consumed tokens: 121792102400 | elapsed time per iteration (s): 0.69 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.538673E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.295 | TFLOPs: 22.40 | 31: iteration 232400/ 476837 | consumed samples: 59494400 | consumed tokens: 121844531200 | elapsed time per iteration (s): 0.68 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.542149E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.810 | TFLOPs: 22.74 | 31: iteration 232500/ 476837 | consumed samples: 59520000 | consumed tokens: 121896960000 | elapsed time per iteration (s): 0.68 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.545296E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.375 | TFLOPs: 22.65 | 31: iteration 232600/ 476837 | consumed samples: 59545600 | consumed tokens: 121949388800 | elapsed time per iteration (s): 0.76 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.542904E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 337.443 | TFLOPs: 20.41 | 31: iteration 232700/ 476837 | consumed samples: 59571200 | consumed tokens: 122001817600 | elapsed time per iteration (s): 0.71 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.544928E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.645 | TFLOPs: 21.88 | 31: iteration 232800/ 476837 | consumed samples: 59596800 | consumed tokens: 122054246400 | elapsed time per iteration (s): 0.68 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.541480E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.046 | TFLOPs: 22.69 | 31: iteration 232900/ 476837 | consumed samples: 59622400 | consumed tokens: 122106675200 | elapsed time per iteration (s): 0.68 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.545245E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.939 | TFLOPs: 22.80 | 31: iteration 233000/ 476837 | consumed samples: 59648000 | consumed tokens: 122159104000 | elapsed time per iteration (s): 0.68 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.543299E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.571 | TFLOPs: 22.78 | 31: iteration 233100/ 476837 | consumed samples: 59673600 | consumed tokens: 122211532800 | elapsed time per iteration (s): 0.68 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.539105E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.266 | TFLOPs: 22.76 | 31: iteration 233200/ 476837 | consumed samples: 59699200 | consumed tokens: 122263961600 | elapsed time per iteration (s): 0.73 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.537758E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 351.378 | TFLOPs: 21.26 | 31: iteration 233300/ 476837 | consumed samples: 59724800 | consumed tokens: 122316390400 | elapsed time per iteration (s): 0.68 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.540664E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.961 | TFLOPs: 22.81 | 31: iteration 233400/ 476837 | consumed samples: 59750400 | consumed tokens: 122368819200 | elapsed time per iteration (s): 0.68 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.545108E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.956 | TFLOPs: 22.80 | 31: iteration 233500/ 476837 | consumed samples: 59776000 | consumed tokens: 122421248000 | elapsed time per iteration (s): 0.68 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.545336E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.875 | TFLOPs: 22.80 | 31: iteration 233600/ 476837 | consumed samples: 59801600 | consumed tokens: 122473676800 | elapsed time per iteration (s): 0.69 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.542300E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.720 | TFLOPs: 22.49 | 31: iteration 233700/ 476837 | consumed samples: 59827200 | consumed tokens: 122526105600 | elapsed time per iteration (s): 0.68 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.543130E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.486 | TFLOPs: 22.78 | 31: iteration 233800/ 476837 | consumed samples: 59852800 | consumed tokens: 122578534400 | elapsed time per iteration (s): 0.72 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.543756E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 356.893 | TFLOPs: 21.59 | 31: iteration 233900/ 476837 | consumed samples: 59878400 | consumed tokens: 122630963200 | elapsed time per iteration (s): 0.71 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.542845E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.006 | TFLOPs: 21.84 | 0: [2023-04-27 17:59:39,339] [INFO] [logging.py:68:log_dist] [Rank 0] step=234000, skipped=0, lr=[0.00011407309219643292, 0.00011407309219643292, 0.00011407309219643292], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 234000/ 476837 | consumed samples: 59904000 | consumed tokens: 122683392000 | elapsed time per iteration (s): 0.68 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.544796E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.529 | TFLOPs: 22.78 | 0: steps: 234000 loss: 2.5808 iter time (s): 0.689 samples/sec: 371.727 31: iteration 234100/ 476837 | consumed samples: 59929600 | consumed tokens: 122735820800 | elapsed time per iteration (s): 0.68 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.545540E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.046 | TFLOPs: 22.75 | 31: iteration 234200/ 476837 | consumed samples: 59955200 | consumed tokens: 122788249600 | elapsed time per iteration (s): 0.69 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.540893E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.847 | TFLOPs: 22.37 | 31: iteration 234300/ 476837 | consumed samples: 59980800 | consumed tokens: 122840678400 | elapsed time per iteration (s): 0.68 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.541885E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.941 | TFLOPs: 22.80 | 31: iteration 234400/ 476837 | consumed samples: 60006400 | consumed tokens: 122893107200 | elapsed time per iteration (s): 0.68 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.542996E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.838 | TFLOPs: 22.80 | 31: iteration 234500/ 476837 | consumed samples: 60032000 | consumed tokens: 122945536000 | elapsed time per iteration (s): 0.69 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.536525E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.162 | TFLOPs: 22.51 | 31: iteration 234600/ 476837 | consumed samples: 60057600 | consumed tokens: 122997964800 | elapsed time per iteration (s): 0.69 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.536749E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.716 | TFLOPs: 22.49 | 31: iteration 234700/ 476837 | consumed samples: 60083200 | consumed tokens: 123050393600 | elapsed time per iteration (s): 0.70 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.545119E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.517 | TFLOPs: 22.17 | 31: iteration 234800/ 476837 | consumed samples: 60108800 | consumed tokens: 123102822400 | elapsed time per iteration (s): 0.75 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.542705E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 341.787 | TFLOPs: 20.68 | 31: iteration 234900/ 476837 | consumed samples: 60134400 | consumed tokens: 123155251200 | elapsed time per iteration (s): 0.69 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.538994E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.960 | TFLOPs: 22.32 | 31: iteration 235000/ 476837 | consumed samples: 60160000 | consumed tokens: 123207680000 | elapsed time per iteration (s): 0.68 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.542058E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.126 | TFLOPs: 22.75 | 31: iteration 235100/ 476837 | consumed samples: 60185600 | consumed tokens: 123260108800 | elapsed time per iteration (s): 0.68 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.543501E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.765 | TFLOPs: 22.79 | 31: iteration 235200/ 476837 | consumed samples: 60211200 | consumed tokens: 123312537600 | elapsed time per iteration (s): 0.68 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.540789E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.899 | TFLOPs: 22.68 | 31: iteration 235300/ 476837 | consumed samples: 60236800 | consumed tokens: 123364966400 | elapsed time per iteration (s): 0.68 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.538860E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.764 | TFLOPs: 22.79 | 31: iteration 235400/ 476837 | consumed samples: 60262400 | consumed tokens: 123417395200 | elapsed time per iteration (s): 0.75 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.544256E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 340.457 | TFLOPs: 20.60 | 31: iteration 235500/ 476837 | consumed samples: 60288000 | consumed tokens: 123469824000 | elapsed time per iteration (s): 0.70 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.537071E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.087 | TFLOPs: 22.21 | 31: iteration 235600/ 476837 | consumed samples: 60313600 | consumed tokens: 123522252800 | elapsed time per iteration (s): 0.68 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.539608E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.308 | TFLOPs: 22.77 | 31: iteration 235700/ 476837 | consumed samples: 60339200 | consumed tokens: 123574681600 | elapsed time per iteration (s): 0.68 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.541196E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.111 | TFLOPs: 22.63 | 31: iteration 235800/ 476837 | consumed samples: 60364800 | consumed tokens: 123627110400 | elapsed time per iteration (s): 0.74 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.541595E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.641 | TFLOPs: 20.79 | 31: iteration 235900/ 476837 | consumed samples: 60390400 | consumed tokens: 123679539200 | elapsed time per iteration (s): 0.68 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.537319E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.139 | TFLOPs: 22.82 | 0: [2023-04-27 18:22:48,534] [INFO] [logging.py:68:log_dist] [Rank 0] step=236000, skipped=0, lr=[0.00011287610370098398, 0.00011287610370098398, 0.00011287610370098398], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 236000/ 476837 | consumed samples: 60416000 | consumed tokens: 123731968000 | elapsed time per iteration (s): 0.68 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.538225E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.057 | TFLOPs: 22.69 | 0: steps: 236000 loss: 2.4787 iter time (s): 0.691 samples/sec: 370.560 31: iteration 236100/ 476837 | consumed samples: 60441600 | consumed tokens: 123784396800 | elapsed time per iteration (s): 0.68 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.539881E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.867 | TFLOPs: 22.74 | 31: iteration 236200/ 476837 | consumed samples: 60467200 | consumed tokens: 123836825600 | elapsed time per iteration (s): 0.68 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.543022E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.084 | TFLOPs: 22.81 | 31: iteration 236300/ 476837 | consumed samples: 60492800 | consumed tokens: 123889254400 | elapsed time per iteration (s): 0.69 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.538919E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.440 | TFLOPs: 22.47 | 31: iteration 236400/ 476837 | consumed samples: 60518400 | consumed tokens: 123941683200 | elapsed time per iteration (s): 0.68 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.532992E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.031 | TFLOPs: 22.63 | 31: iteration 236500/ 476837 | consumed samples: 60544000 | consumed tokens: 123994112000 | elapsed time per iteration (s): 0.68 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.537441E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.880 | TFLOPs: 22.68 | 31: iteration 236600/ 476837 | consumed samples: 60569600 | consumed tokens: 124046540800 | elapsed time per iteration (s): 0.69 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.539636E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.301 | TFLOPs: 22.58 | 31: iteration 236700/ 476837 | consumed samples: 60595200 | consumed tokens: 124098969600 | elapsed time per iteration (s): 0.73 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.536442E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.950 | TFLOPs: 21.23 | 31: iteration 236800/ 476837 | consumed samples: 60620800 | consumed tokens: 124151398400 | elapsed time per iteration (s): 0.68 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.543757E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.872 | TFLOPs: 22.80 | 31: iteration 236900/ 476837 | consumed samples: 60646400 | consumed tokens: 124203827200 | elapsed time per iteration (s): 0.71 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.542797E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.811 | TFLOPs: 21.89 | 31: iteration 237000/ 476837 | consumed samples: 60672000 | consumed tokens: 124256256000 | elapsed time per iteration (s): 0.69 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.536381E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.434 | TFLOPs: 22.59 | 31: iteration 237100/ 476837 | consumed samples: 60697600 | consumed tokens: 124308684800 | elapsed time per iteration (s): 0.85 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.537388E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 300.241 | TFLOPs: 18.16 | 31: iteration 237200/ 476837 | consumed samples: 60723200 | consumed tokens: 124361113600 | elapsed time per iteration (s): 0.69 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.539577E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.037 | TFLOPs: 22.51 | 31: iteration 237300/ 476837 | consumed samples: 60748800 | consumed tokens: 124413542400 | elapsed time per iteration (s): 0.69 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.536590E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.669 | TFLOPs: 22.61 | 31: iteration 237400/ 476837 | consumed samples: 60774400 | consumed tokens: 124465971200 | elapsed time per iteration (s): 0.68 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.539687E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.591 | TFLOPs: 22.72 | 31: iteration 237500/ 476837 | consumed samples: 60800000 | consumed tokens: 124518400000 | elapsed time per iteration (s): 0.68 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.541396E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.310 | TFLOPs: 22.71 | 31: iteration 237600/ 476837 | consumed samples: 60825600 | consumed tokens: 124570828800 | elapsed time per iteration (s): 0.69 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.540575E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.424 | TFLOPs: 22.59 | 31: iteration 237700/ 476837 | consumed samples: 60851200 | consumed tokens: 124623257600 | elapsed time per iteration (s): 0.68 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.532469E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.153 | TFLOPs: 22.70 | 31: iteration 237800/ 476837 | consumed samples: 60876800 | consumed tokens: 124675686400 | elapsed time per iteration (s): 0.78 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.537986E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 327.616 | TFLOPs: 19.82 | 31: iteration 237900/ 476837 | consumed samples: 60902400 | consumed tokens: 124728115200 | elapsed time per iteration (s): 0.74 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.538870E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.865 | TFLOPs: 20.80 | 0: [2023-04-27 18:46:17,606] [INFO] [logging.py:68:log_dist] [Rank 0] step=238000, skipped=0, lr=[0.0001116786057018957, 0.0001116786057018957, 0.0001116786057018957], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 238000/ 476837 | consumed samples: 60928000 | consumed tokens: 124780544000 | elapsed time per iteration (s): 0.70 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.538566E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.016 | TFLOPs: 22.02 | 0: steps: 238000 loss: 2.5214 iter time (s): 0.701 samples/sec: 365.298 31: iteration 238100/ 476837 | consumed samples: 60953600 | consumed tokens: 124832972800 | elapsed time per iteration (s): 0.68 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.540648E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.170 | TFLOPs: 22.70 | 31: iteration 238200/ 476837 | consumed samples: 60979200 | consumed tokens: 124885401600 | elapsed time per iteration (s): 0.69 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.538212E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.909 | TFLOPs: 22.56 | 31: iteration 238300/ 476837 | consumed samples: 61004800 | consumed tokens: 124937830400 | elapsed time per iteration (s): 0.74 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.536521E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 348.040 | TFLOPs: 21.06 | 31: iteration 238400/ 476837 | consumed samples: 61030400 | consumed tokens: 124990259200 | elapsed time per iteration (s): 0.68 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.534393E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.072 | TFLOPs: 22.69 | 31: iteration 238500/ 476837 | consumed samples: 61056000 | consumed tokens: 125042688000 | elapsed time per iteration (s): 0.72 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.538982E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.087 | TFLOPs: 21.48 | 31: iteration 238600/ 476837 | consumed samples: 61081600 | consumed tokens: 125095116800 | elapsed time per iteration (s): 0.68 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.536891E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.886 | TFLOPs: 22.74 | 31: iteration 238700/ 476837 | consumed samples: 61107200 | consumed tokens: 125147545600 | elapsed time per iteration (s): 0.68 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.538724E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.871 | TFLOPs: 22.80 | 31: iteration 238800/ 476837 | consumed samples: 61132800 | consumed tokens: 125199974400 | elapsed time per iteration (s): 0.68 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.542991E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.782 | TFLOPs: 22.79 | 31: iteration 238900/ 476837 | consumed samples: 61158400 | consumed tokens: 125252403200 | elapsed time per iteration (s): 0.68 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.538945E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.191 | TFLOPs: 22.70 | 31: iteration 239000/ 476837 | consumed samples: 61184000 | consumed tokens: 125304832000 | elapsed time per iteration (s): 0.71 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.532462E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.427 | TFLOPs: 21.93 | 31: iteration 239100/ 476837 | consumed samples: 61209600 | consumed tokens: 125357260800 | elapsed time per iteration (s): 0.81 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.536977E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 316.389 | TFLOPs: 19.14 | 31: iteration 239200/ 476837 | consumed samples: 61235200 | consumed tokens: 125409689600 | elapsed time per iteration (s): 0.81 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.531608E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 314.576 | TFLOPs: 19.03 | 31: iteration 239300/ 476837 | consumed samples: 61260800 | consumed tokens: 125462118400 | elapsed time per iteration (s): 0.72 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.538751E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 356.200 | TFLOPs: 21.55 | 31: iteration 239400/ 476837 | consumed samples: 61286400 | consumed tokens: 125514547200 | elapsed time per iteration (s): 0.73 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.537774E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.792 | TFLOPs: 21.16 | 31: iteration 239500/ 476837 | consumed samples: 61312000 | consumed tokens: 125566976000 | elapsed time per iteration (s): 0.68 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.538457E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.978 | TFLOPs: 22.81 | 31: iteration 239600/ 476837 | consumed samples: 61337600 | consumed tokens: 125619404800 | elapsed time per iteration (s): 0.78 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.534867E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 327.674 | TFLOPs: 19.82 | 31: iteration 239700/ 476837 | consumed samples: 61363200 | consumed tokens: 125671833600 | elapsed time per iteration (s): 0.73 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.536942E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 349.810 | TFLOPs: 21.16 | 31: iteration 239800/ 476837 | consumed samples: 61388800 | consumed tokens: 125724262400 | elapsed time per iteration (s): 0.73 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.538938E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.743 | TFLOPs: 21.22 | 31: iteration 239900/ 476837 | consumed samples: 61414400 | consumed tokens: 125776691200 | elapsed time per iteration (s): 0.68 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.536353E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.658 | TFLOPs: 22.79 | 0: [2023-04-27 19:10:06,829] [INFO] [logging.py:68:log_dist] [Rank 0] step=240000, skipped=0, lr=[0.00011048081033671671, 0.00011048081033671671, 0.00011048081033671671], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 240000/ 476837 | consumed samples: 61440000 | consumed tokens: 125829120000 | elapsed time per iteration (s): 0.68 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.534193E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.087 | TFLOPs: 22.75 | 0: steps: 240000 loss: 2.5565 iter time (s): 0.711 samples/sec: 360.018 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 240000 | lm loss value: 2.979465E+00 | lm loss PPL: 1.967730E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 240000 to checkpoints_1b1250b1b5 0: [2023-04-27 19:10:07,152] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step240000 is begin to save! 0: [2023-04-27 19:10:07,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_01-model_00-model_states.pt... 0: [2023-04-27 19:10:07,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_01-model_00-model_states.pt. 0: [2023-04-27 19:10:07,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_03-model_00-model_states.pt... 0: [2023-04-27 19:10:07,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_03-model_00-model_states.pt. 0: [2023-04-27 19:10:07,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_04-model_00-model_states.pt... 0: [2023-04-27 19:10:07,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_04-model_00-model_states.pt. 0: [2023-04-27 19:10:07,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_05-model_00-model_states.pt... 0: [2023-04-27 19:10:07,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_05-model_00-model_states.pt. 0: [2023-04-27 19:10:07,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_06-model_00-model_states.pt... 0: [2023-04-27 19:10:07,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_06-model_00-model_states.pt. 0: [2023-04-27 19:10:07,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_07-model_00-model_states.pt... 0: [2023-04-27 19:10:08,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_07-model_00-model_states.pt. 0: [2023-04-27 19:10:08,019] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_08-model_00-model_states.pt... 0: [2023-04-27 19:10:08,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_08-model_00-model_states.pt. 0: [2023-04-27 19:10:08,111] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_09-model_00-model_states.pt... 0: [2023-04-27 19:10:08,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_09-model_00-model_states.pt. 0: [2023-04-27 19:10:08,208] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_10-model_00-model_states.pt... 0: [2023-04-27 19:10:08,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_10-model_00-model_states.pt. 0: [2023-04-27 19:10:08,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_11-model_00-model_states.pt... 0: [2023-04-27 19:10:08,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_11-model_00-model_states.pt. 0: [2023-04-27 19:10:08,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_12-model_00-model_states.pt... 0: [2023-04-27 19:10:08,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_12-model_00-model_states.pt. 0: [2023-04-27 19:10:08,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_13-model_00-model_states.pt... 0: [2023-04-27 19:10:08,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_13-model_00-model_states.pt. 0: [2023-04-27 19:10:08,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_14-model_00-model_states.pt... 0: [2023-04-27 19:10:08,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_14-model_00-model_states.pt. 0: [2023-04-27 19:10:08,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_15-model_00-model_states.pt... 0: [2023-04-27 19:10:08,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_15-model_00-model_states.pt. 0: [2023-04-27 19:10:08,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_16-model_00-model_states.pt... 0: [2023-04-27 19:10:08,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_16-model_00-model_states.pt. 0: [2023-04-27 19:10:08,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_17-model_00-model_states.pt... 0: [2023-04-27 19:10:08,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_17-model_00-model_states.pt. 0: [2023-04-27 19:10:08,931] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_18-model_00-model_states.pt... 0: [2023-04-27 19:10:09,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_18-model_00-model_states.pt. 0: [2023-04-27 19:10:09,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_19-model_00-model_states.pt... 0: [2023-04-27 19:10:09,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_19-model_00-model_states.pt. 0: [2023-04-27 19:10:09,120] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_20-model_00-model_states.pt... 0: [2023-04-27 19:10:09,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_20-model_00-model_states.pt. 0: [2023-04-27 19:10:09,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_21-model_00-model_states.pt... 0: [2023-04-27 19:10:09,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_21-model_00-model_states.pt. 0: [2023-04-27 19:10:09,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_22-model_00-model_states.pt... 0: [2023-04-27 19:10:09,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_22-model_00-model_states.pt. 0: [2023-04-27 19:10:09,390] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_23-model_00-model_states.pt... 0: [2023-04-27 19:10:09,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_23-model_00-model_states.pt. 0: [2023-04-27 19:10:09,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_24-model_00-model_states.pt... 0: [2023-04-27 19:10:09,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_24-model_00-model_states.pt. 0: [2023-04-27 19:10:09,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_25-model_00-model_states.pt... 0: [2023-04-27 19:10:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_25-model_00-model_states.pt. 0: [2023-04-27 19:10:09,643] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_26-model_00-model_states.pt... 0: [2023-04-27 19:10:09,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_26-model_00-model_states.pt. 0: [2023-04-27 19:10:09,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_27-model_00-model_states.pt... 0: [2023-04-27 19:10:09,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_27-model_00-model_states.pt. 0: [2023-04-27 19:10:09,829] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_28-model_00-model_states.pt... 0: [2023-04-27 19:10:09,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_28-model_00-model_states.pt. 0: [2023-04-27 19:10:09,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/layer_30-model_00-model_states.pt... 0: [2023-04-27 19:10:09,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/layer_30-model_00-model_states.pt. 0: [2023-04-27 19:10:09,920] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step240000/mp_rank_00_model_states.pt 0: [2023-04-27 19:10:09,920] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/mp_rank_00_model_states.pt... 0: [2023-04-27 19:10:10,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/mp_rank_00_model_states.pt. 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 26: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 13: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 3: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 10: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 23: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 30: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 16: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 7: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 12: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 19: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 18: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 17: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 25: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 31: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 6: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 9: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 20: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 27: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 28: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 21: [2023-04-27 19:10:10,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 0: [2023-04-27 19:10:10,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 9: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 8: [2023-04-27 19:10:10,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 10: [2023-04-27 19:10:10,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 10: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 10: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 30: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 10: [2023-04-27 19:10:10,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 10: [2023-04-27 19:10:10,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 10: [2023-04-27 19:10:10,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-27 19:10:10,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-27 19:10:10,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 19: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,268] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 28: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 13: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 14: [2023-04-27 19:10:10,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 19:10:10,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-27 19:10:10,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 3: [2023-04-27 19:10:10,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 19:10:10,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-27 19:10:10,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 16: [2023-04-27 19:10:10,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-27 19:10:10,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-27 19:10:10,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 18: [2023-04-27 19:10:10,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 5: [2023-04-27 19:10:10,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-27 19:10:10,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-27 19:10:10,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: [2023-04-27 19:10:10,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-27 19:10:10,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-27 19:10:10,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 20: [2023-04-27 19:10:10,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-27 19:10:10,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-27 19:10:10,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 1: [2023-04-27 19:10:10,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 26: [2023-04-27 19:10:10,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 19:10:10,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 2: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 12: [2023-04-27 19:10:10,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-27 19:10:10,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-27 19:10:10,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 21: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 24: [2023-04-27 19:10:10,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 19:10:10,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-27 19:10:10,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 11: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 11: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 11: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 15: [2023-04-27 19:10:10,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-27 19:10:10,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-27 19:10:10,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 4: [2023-04-27 19:10:10,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 22: [2023-04-27 19:10:10,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 6: [2023-04-27 19:10:10,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 19:10:10,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-27 19:10:10,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 31: [2023-04-27 19:10:10,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 7: [2023-04-27 19:10:10,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 19:10:10,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-27 19:10:10,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 23: [2023-04-27 19:10:10,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 27: [2023-04-27 19:10:10,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 29: [2023-04-27 19:10:10,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 25: [2023-04-27 19:10:10,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-27 19:10:10,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step240000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-27 19:10:10,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step240000 is ready now! 0: successfully saved checkpoint at iteration 240000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 3316.60 31: iteration 240100/ 476837 | consumed samples: 61465600 | consumed tokens: 125881548800 | elapsed time per iteration (s): 0.80 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.540469E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 321.816 | TFLOPs: 19.47 | 31: iteration 240200/ 476837 | consumed samples: 61491200 | consumed tokens: 125933977600 | elapsed time per iteration (s): 0.68 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.531979E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.860 | TFLOPs: 22.80 | 31: iteration 240300/ 476837 | consumed samples: 61516800 | consumed tokens: 125986406400 | elapsed time per iteration (s): 0.68 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.534574E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.359 | TFLOPs: 22.77 | 31: iteration 240400/ 476837 | consumed samples: 61542400 | consumed tokens: 126038835200 | elapsed time per iteration (s): 0.68 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.535194E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.863 | TFLOPs: 22.80 | 31: iteration 240500/ 476837 | consumed samples: 61568000 | consumed tokens: 126091264000 | elapsed time per iteration (s): 0.68 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.533286E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.406 | TFLOPs: 22.77 | 31: iteration 240600/ 476837 | consumed samples: 61593600 | consumed tokens: 126143692800 | elapsed time per iteration (s): 0.68 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.538846E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.792 | TFLOPs: 22.79 | 31: iteration 240700/ 476837 | consumed samples: 61619200 | consumed tokens: 126196121600 | elapsed time per iteration (s): 0.68 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.533763E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.471 | TFLOPs: 22.72 | 31: iteration 240800/ 476837 | consumed samples: 61644800 | consumed tokens: 126248550400 | elapsed time per iteration (s): 0.68 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.540815E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.802 | TFLOPs: 22.80 | 31: iteration 240900/ 476837 | consumed samples: 61670400 | consumed tokens: 126300979200 | elapsed time per iteration (s): 0.68 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.532697E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.431 | TFLOPs: 22.77 | 31: iteration 241000/ 476837 | consumed samples: 61696000 | consumed tokens: 126353408000 | elapsed time per iteration (s): 0.76 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.539530E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 338.513 | TFLOPs: 20.48 | 31: iteration 241100/ 476837 | consumed samples: 61721600 | consumed tokens: 126405836800 | elapsed time per iteration (s): 0.68 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.531753E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.880 | TFLOPs: 22.74 | 31: iteration 241200/ 476837 | consumed samples: 61747200 | consumed tokens: 126458265600 | elapsed time per iteration (s): 0.68 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.532403E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.189 | TFLOPs: 22.76 | 31: iteration 241300/ 476837 | consumed samples: 61772800 | consumed tokens: 126510694400 | elapsed time per iteration (s): 0.68 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.530556E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.803 | TFLOPs: 22.80 | 31: iteration 241400/ 476837 | consumed samples: 61798400 | consumed tokens: 126563123200 | elapsed time per iteration (s): 0.68 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.532479E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.741 | TFLOPs: 22.79 | 31: iteration 241500/ 476837 | consumed samples: 61824000 | consumed tokens: 126615552000 | elapsed time per iteration (s): 0.68 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.535128E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.447 | TFLOPs: 22.77 | 31: iteration 241600/ 476837 | consumed samples: 61849600 | consumed tokens: 126667980800 | elapsed time per iteration (s): 0.72 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.537591E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.274 | TFLOPs: 21.49 | 31: iteration 241700/ 476837 | consumed samples: 61875200 | consumed tokens: 126720409600 | elapsed time per iteration (s): 0.73 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.536096E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.981 | TFLOPs: 21.23 | 31: iteration 241800/ 476837 | consumed samples: 61900800 | consumed tokens: 126772838400 | elapsed time per iteration (s): 0.69 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.537603E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.635 | TFLOPs: 22.30 | 31: iteration 241900/ 476837 | consumed samples: 61926400 | consumed tokens: 126825267200 | elapsed time per iteration (s): 0.68 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.536003E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.789 | TFLOPs: 22.67 | 0: [2023-04-27 19:33:17,008] [INFO] [logging.py:68:log_dist] [Rank 0] step=242000, skipped=0, lr=[0.00010928292979567418, 0.00010928292979567418, 0.00010928292979567418], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 242000/ 476837 | consumed samples: 61952000 | consumed tokens: 126877696000 | elapsed time per iteration (s): 0.68 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.533040E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.089 | TFLOPs: 22.69 | 0: steps: 242000 loss: 2.5306 iter time (s): 0.691 samples/sec: 370.581 31: iteration 242100/ 476837 | consumed samples: 61977600 | consumed tokens: 126930124800 | elapsed time per iteration (s): 0.69 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.551543E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.560 | TFLOPs: 22.60 | 31: iteration 242200/ 476837 | consumed samples: 62003200 | consumed tokens: 126982553600 | elapsed time per iteration (s): 0.68 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.535556E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.079 | TFLOPs: 22.75 | 31: iteration 242300/ 476837 | consumed samples: 62028800 | consumed tokens: 127034982400 | elapsed time per iteration (s): 0.68 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.538434E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.694 | TFLOPs: 22.79 | 31: iteration 242400/ 476837 | consumed samples: 62054400 | consumed tokens: 127087411200 | elapsed time per iteration (s): 0.69 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.535988E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.298 | TFLOPs: 22.34 | 31: iteration 242500/ 476837 | consumed samples: 62080000 | consumed tokens: 127139840000 | elapsed time per iteration (s): 0.70 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.538794E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.958 | TFLOPs: 22.20 | 31: iteration 242600/ 476837 | consumed samples: 62105600 | consumed tokens: 127192268800 | elapsed time per iteration (s): 0.68 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.537955E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.000 | TFLOPs: 22.69 | 31: iteration 242700/ 476837 | consumed samples: 62131200 | consumed tokens: 127244697600 | elapsed time per iteration (s): 0.71 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.535847E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.990 | TFLOPs: 21.96 | 31: iteration 242800/ 476837 | consumed samples: 62156800 | consumed tokens: 127297126400 | elapsed time per iteration (s): 0.68 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.533518E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.928 | TFLOPs: 22.74 | 31: iteration 242900/ 476837 | consumed samples: 62182400 | consumed tokens: 127349555200 | elapsed time per iteration (s): 0.68 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.536011E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.596 | TFLOPs: 22.78 | 31: iteration 243000/ 476837 | consumed samples: 62208000 | consumed tokens: 127401984000 | elapsed time per iteration (s): 0.71 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.534847E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.713 | TFLOPs: 21.70 | 31: iteration 243100/ 476837 | consumed samples: 62233600 | consumed tokens: 127454412800 | elapsed time per iteration (s): 0.68 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.530813E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.393 | TFLOPs: 22.77 | 31: iteration 243200/ 476837 | consumed samples: 62259200 | consumed tokens: 127506841600 | elapsed time per iteration (s): 0.70 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.530929E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.116 | TFLOPs: 22.09 | 31: iteration 243300/ 476837 | consumed samples: 62284800 | consumed tokens: 127559270400 | elapsed time per iteration (s): 0.68 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.536694E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.851 | TFLOPs: 22.80 | 31: iteration 243400/ 476837 | consumed samples: 62310400 | consumed tokens: 127611699200 | elapsed time per iteration (s): 1.07 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.536355E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 239.622 | TFLOPs: 14.50 | 31: iteration 243500/ 476837 | consumed samples: 62336000 | consumed tokens: 127664128000 | elapsed time per iteration (s): 0.70 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.532275E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 365.731 | TFLOPs: 22.13 | 31: iteration 243600/ 476837 | consumed samples: 62361600 | consumed tokens: 127716556800 | elapsed time per iteration (s): 0.74 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.534181E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 345.229 | TFLOPs: 20.89 | 31: iteration 243700/ 476837 | consumed samples: 62387200 | consumed tokens: 127768985600 | elapsed time per iteration (s): 0.69 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.530930E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.411 | TFLOPs: 22.35 | 31: iteration 243800/ 476837 | consumed samples: 62412800 | consumed tokens: 127821414400 | elapsed time per iteration (s): 0.82 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.532055E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 312.187 | TFLOPs: 18.89 | 31: iteration 243900/ 476837 | consumed samples: 62438400 | consumed tokens: 127873843200 | elapsed time per iteration (s): 0.82 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.532597E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 313.401 | TFLOPs: 18.96 | 0: [2023-04-27 19:57:34,315] [INFO] [logging.py:68:log_dist] [Rank 0] step=244000, skipped=0, lr=[0.00010808517628408425, 0.00010808517628408425, 0.00010808517628408425], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 244000/ 476837 | consumed samples: 62464000 | consumed tokens: 127926272000 | elapsed time per iteration (s): 0.77 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.536003E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 330.786 | TFLOPs: 20.01 | 0: steps: 244000 loss: 2.5203 iter time (s): 0.726 samples/sec: 352.773 31: iteration 244100/ 476837 | consumed samples: 62489600 | consumed tokens: 127978700800 | elapsed time per iteration (s): 0.68 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.533294E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.598 | TFLOPs: 22.66 | 31: iteration 244200/ 476837 | consumed samples: 62515200 | consumed tokens: 128031129600 | elapsed time per iteration (s): 0.75 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.534448E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.031 | TFLOPs: 20.75 | 31: iteration 244300/ 476837 | consumed samples: 62540800 | consumed tokens: 128083558400 | elapsed time per iteration (s): 0.68 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.532424E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.883 | TFLOPs: 22.80 | 31: iteration 244400/ 476837 | consumed samples: 62566400 | consumed tokens: 128135987200 | elapsed time per iteration (s): 0.96 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.532503E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 266.499 | TFLOPs: 16.12 | 31: iteration 244500/ 476837 | consumed samples: 62592000 | consumed tokens: 128188416000 | elapsed time per iteration (s): 0.68 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.534420E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.034 | TFLOPs: 22.81 | 31: iteration 244600/ 476837 | consumed samples: 62617600 | consumed tokens: 128240844800 | elapsed time per iteration (s): 0.68 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.535714E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.026 | TFLOPs: 22.81 | 31: iteration 244700/ 476837 | consumed samples: 62643200 | consumed tokens: 128293273600 | elapsed time per iteration (s): 0.84 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.537019E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 303.450 | TFLOPs: 18.36 | 31: iteration 244800/ 476837 | consumed samples: 62668800 | consumed tokens: 128345702400 | elapsed time per iteration (s): 0.68 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.529311E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.063 | TFLOPs: 22.63 | 31: iteration 244900/ 476837 | consumed samples: 62694400 | consumed tokens: 128398131200 | elapsed time per iteration (s): 0.69 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.530718E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.044 | TFLOPs: 22.57 | 31: iteration 245000/ 476837 | consumed samples: 62720000 | consumed tokens: 128450560000 | elapsed time per iteration (s): 0.68 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.533827E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.077 | TFLOPs: 22.75 | 31: iteration 245100/ 476837 | consumed samples: 62745600 | consumed tokens: 128502988800 | elapsed time per iteration (s): 0.84 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.530595E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 306.251 | TFLOPs: 18.53 | 31: iteration 245200/ 476837 | consumed samples: 62771200 | consumed tokens: 128555417600 | elapsed time per iteration (s): 0.68 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.533145E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.973 | TFLOPs: 22.81 | 31: iteration 245300/ 476837 | consumed samples: 62796800 | consumed tokens: 128607846400 | elapsed time per iteration (s): 0.72 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.531163E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.467 | TFLOPs: 21.63 | 31: iteration 245400/ 476837 | consumed samples: 62822400 | consumed tokens: 128660275200 | elapsed time per iteration (s): 0.68 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.533888E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.888 | TFLOPs: 22.80 | 31: iteration 245500/ 476837 | consumed samples: 62848000 | consumed tokens: 128712704000 | elapsed time per iteration (s): 0.76 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.527927E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 337.953 | TFLOPs: 20.45 | 31: iteration 245600/ 476837 | consumed samples: 62873600 | consumed tokens: 128765132800 | elapsed time per iteration (s): 0.68 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.530549E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.924 | TFLOPs: 22.80 | 31: iteration 245700/ 476837 | consumed samples: 62899200 | consumed tokens: 128817561600 | elapsed time per iteration (s): 0.68 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.532604E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.937 | TFLOPs: 22.80 | 31: iteration 245800/ 476837 | consumed samples: 62924800 | consumed tokens: 128869990400 | elapsed time per iteration (s): 0.86 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.535586E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 297.735 | TFLOPs: 18.01 | 31: iteration 245900/ 476837 | consumed samples: 62950400 | consumed tokens: 128922419200 | elapsed time per iteration (s): 0.68 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.529396E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.145 | TFLOPs: 22.82 | 0: [2023-04-27 20:21:50,982] [INFO] [logging.py:68:log_dist] [Rank 0] step=246000, skipped=0, lr=[0.00010688776198475971, 0.00010688776198475971, 0.00010688776198475971], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 246000/ 476837 | consumed samples: 62976000 | consumed tokens: 128974848000 | elapsed time per iteration (s): 0.68 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.529780E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.864 | TFLOPs: 22.80 | 0: steps: 246000 loss: 2.5898 iter time (s): 0.725 samples/sec: 353.134 31: iteration 246100/ 476837 | consumed samples: 63001600 | consumed tokens: 129027276800 | elapsed time per iteration (s): 0.68 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.532529E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.676 | TFLOPs: 22.73 | 31: iteration 246200/ 476837 | consumed samples: 63027200 | consumed tokens: 129079705600 | elapsed time per iteration (s): 0.82 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.526789E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 312.565 | TFLOPs: 18.91 | 31: iteration 246300/ 476837 | consumed samples: 63052800 | consumed tokens: 129132134400 | elapsed time per iteration (s): 0.77 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.536422E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 331.110 | TFLOPs: 20.03 | 31: iteration 246400/ 476837 | consumed samples: 63078400 | consumed tokens: 129184563200 | elapsed time per iteration (s): 0.75 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.526395E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.178 | TFLOPs: 20.76 | 31: iteration 246500/ 476837 | consumed samples: 63104000 | consumed tokens: 129236992000 | elapsed time per iteration (s): 0.68 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.530273E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.432 | TFLOPs: 22.77 | 31: iteration 246600/ 476837 | consumed samples: 63129600 | consumed tokens: 129289420800 | elapsed time per iteration (s): 0.80 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.531060E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 320.502 | TFLOPs: 19.39 | 31: iteration 246700/ 476837 | consumed samples: 63155200 | consumed tokens: 129341849600 | elapsed time per iteration (s): 0.68 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.532903E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.952 | TFLOPs: 22.80 | 31: iteration 246800/ 476837 | consumed samples: 63180800 | consumed tokens: 129394278400 | elapsed time per iteration (s): 0.68 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.530381E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.967 | TFLOPs: 22.81 | 31: iteration 246900/ 476837 | consumed samples: 63206400 | consumed tokens: 129446707200 | elapsed time per iteration (s): 0.68 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.531382E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.388 | TFLOPs: 22.77 | 31: iteration 247000/ 476837 | consumed samples: 63232000 | consumed tokens: 129499136000 | elapsed time per iteration (s): 0.68 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.530515E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.832 | TFLOPs: 22.74 | 31: iteration 247100/ 476837 | consumed samples: 63257600 | consumed tokens: 129551564800 | elapsed time per iteration (s): 0.68 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.528510E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.899 | TFLOPs: 22.80 | 31: iteration 247200/ 476837 | consumed samples: 63283200 | consumed tokens: 129603993600 | elapsed time per iteration (s): 0.69 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.527961E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.660 | TFLOPs: 22.61 | 31: iteration 247300/ 476837 | consumed samples: 63308800 | consumed tokens: 129656422400 | elapsed time per iteration (s): 0.68 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.529135E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.992 | TFLOPs: 22.69 | 31: iteration 247400/ 476837 | consumed samples: 63334400 | consumed tokens: 129708851200 | elapsed time per iteration (s): 0.68 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.528685E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.286 | TFLOPs: 22.70 | 31: iteration 247500/ 476837 | consumed samples: 63360000 | consumed tokens: 129761280000 | elapsed time per iteration (s): 0.68 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.527369E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.593 | TFLOPs: 22.66 | 31: iteration 247600/ 476837 | consumed samples: 63385600 | consumed tokens: 129813708800 | elapsed time per iteration (s): 0.68 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.528043E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.697 | TFLOPs: 22.79 | 31: iteration 247700/ 476837 | consumed samples: 63411200 | consumed tokens: 129866137600 | elapsed time per iteration (s): 0.68 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.532866E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.037 | TFLOPs: 22.81 | 31: iteration 247800/ 476837 | consumed samples: 63436800 | consumed tokens: 129918566400 | elapsed time per iteration (s): 0.68 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.528609E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.109 | TFLOPs: 22.81 | 31: iteration 247900/ 476837 | consumed samples: 63462400 | consumed tokens: 129970995200 | elapsed time per iteration (s): 0.68 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.528029E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.594 | TFLOPs: 22.72 | 0: [2023-04-27 20:45:14,245] [INFO] [logging.py:68:log_dist] [Rank 0] step=248000, skipped=0, lr=[0.00010569089902042168, 0.00010569089902042168, 0.00010569089902042168], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 248000/ 476837 | consumed samples: 63488000 | consumed tokens: 130023424000 | elapsed time per iteration (s): 0.68 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.527617E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.754 | TFLOPs: 22.67 | 0: steps: 248000 loss: 2.5203 iter time (s): 0.698 samples/sec: 366.691 31: iteration 248100/ 476837 | consumed samples: 63513600 | consumed tokens: 130075852800 | elapsed time per iteration (s): 0.68 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.529326E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.013 | TFLOPs: 22.81 | 31: iteration 248200/ 476837 | consumed samples: 63539200 | consumed tokens: 130128281600 | elapsed time per iteration (s): 0.68 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.532428E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.067 | TFLOPs: 22.81 | 31: iteration 248300/ 476837 | consumed samples: 63564800 | consumed tokens: 130180710400 | elapsed time per iteration (s): 0.68 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.532635E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.919 | TFLOPs: 22.80 | 31: iteration 248400/ 476837 | consumed samples: 63590400 | consumed tokens: 130233139200 | elapsed time per iteration (s): 0.68 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.522811E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.340 | TFLOPs: 22.71 | 31: iteration 248500/ 476837 | consumed samples: 63616000 | consumed tokens: 130285568000 | elapsed time per iteration (s): 0.68 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.528625E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.723 | TFLOPs: 22.79 | 31: iteration 248600/ 476837 | consumed samples: 63641600 | consumed tokens: 130337996800 | elapsed time per iteration (s): 0.68 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.533026E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.741 | TFLOPs: 22.73 | 31: iteration 248700/ 476837 | consumed samples: 63667200 | consumed tokens: 130390425600 | elapsed time per iteration (s): 0.78 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.528647E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 327.858 | TFLOPs: 19.83 | 31: iteration 248800/ 476837 | consumed samples: 63692800 | consumed tokens: 130442854400 | elapsed time per iteration (s): 0.68 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.528464E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.791 | TFLOPs: 22.79 | 31: iteration 248900/ 476837 | consumed samples: 63718400 | consumed tokens: 130495283200 | elapsed time per iteration (s): 0.68 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.529490E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.684 | TFLOPs: 22.79 | 31: iteration 249000/ 476837 | consumed samples: 63744000 | consumed tokens: 130547712000 | elapsed time per iteration (s): 0.68 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.530527E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.823 | TFLOPs: 22.74 | 31: iteration 249100/ 476837 | consumed samples: 63769600 | consumed tokens: 130600140800 | elapsed time per iteration (s): 0.68 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.528163E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.170 | TFLOPs: 22.70 | 31: iteration 249200/ 476837 | consumed samples: 63795200 | consumed tokens: 130652569600 | elapsed time per iteration (s): 0.68 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.525016E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.242 | TFLOPs: 22.76 | 31: iteration 249300/ 476837 | consumed samples: 63820800 | consumed tokens: 130704998400 | elapsed time per iteration (s): 0.68 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.522651E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.929 | TFLOPs: 22.80 | 31: iteration 249400/ 476837 | consumed samples: 63846400 | consumed tokens: 130757427200 | elapsed time per iteration (s): 0.68 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.526001E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.011 | TFLOPs: 22.75 | 31: iteration 249500/ 476837 | consumed samples: 63872000 | consumed tokens: 130809856000 | elapsed time per iteration (s): 0.68 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.529939E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.993 | TFLOPs: 22.63 | 31: iteration 249600/ 476837 | consumed samples: 63897600 | consumed tokens: 130862284800 | elapsed time per iteration (s): 0.68 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.523150E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.518 | TFLOPs: 22.78 | 31: iteration 249700/ 476837 | consumed samples: 63923200 | consumed tokens: 130914713600 | elapsed time per iteration (s): 0.68 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.525261E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.094 | TFLOPs: 22.81 | 31: iteration 249800/ 476837 | consumed samples: 63948800 | consumed tokens: 130967142400 | elapsed time per iteration (s): 0.68 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.524660E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.069 | TFLOPs: 22.81 | 31: iteration 249900/ 476837 | consumed samples: 63974400 | consumed tokens: 131019571200 | elapsed time per iteration (s): 0.68 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.527614E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.973 | TFLOPs: 22.81 | 0: [2023-04-27 21:08:04,723] [INFO] [logging.py:68:log_dist] [Rank 0] step=250000, skipped=0, lr=[0.0001044947994161219, 0.0001044947994161219, 0.0001044947994161219], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 250000/ 476837 | consumed samples: 64000000 | consumed tokens: 131072000000 | elapsed time per iteration (s): 0.68 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.528467E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.689 | TFLOPs: 22.79 | 0: steps: 250000 loss: 2.5517 iter time (s): 0.682 samples/sec: 375.360 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 250000 | lm loss value: 2.928724E+00 | lm loss PPL: 1.870375E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 250100/ 476837 | consumed samples: 64025600 | consumed tokens: 131124428800 | elapsed time per iteration (s): 0.68 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.527546E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.049 | TFLOPs: 22.75 | 31: iteration 250200/ 476837 | consumed samples: 64051200 | consumed tokens: 131176857600 | elapsed time per iteration (s): 0.68 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.526604E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.016 | TFLOPs: 22.81 | 31: iteration 250300/ 476837 | consumed samples: 64076800 | consumed tokens: 131229286400 | elapsed time per iteration (s): 0.68 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.526001E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.421 | TFLOPs: 22.77 | 31: iteration 250400/ 476837 | consumed samples: 64102400 | consumed tokens: 131281715200 | elapsed time per iteration (s): 0.68 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.526143E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.663 | TFLOPs: 22.67 | 31: iteration 250500/ 476837 | consumed samples: 64128000 | consumed tokens: 131334144000 | elapsed time per iteration (s): 0.68 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.526410E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.637 | TFLOPs: 22.73 | 31: iteration 250600/ 476837 | consumed samples: 64153600 | consumed tokens: 131386572800 | elapsed time per iteration (s): 0.68 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.524011E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.649 | TFLOPs: 22.79 | 31: iteration 250700/ 476837 | consumed samples: 64179200 | consumed tokens: 131439001600 | elapsed time per iteration (s): 0.68 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.527492E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.817 | TFLOPs: 22.80 | 31: iteration 250800/ 476837 | consumed samples: 64204800 | consumed tokens: 131491430400 | elapsed time per iteration (s): 0.68 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.524426E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.094 | TFLOPs: 22.81 | 31: iteration 250900/ 476837 | consumed samples: 64230400 | consumed tokens: 131543859200 | elapsed time per iteration (s): 0.68 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.529760E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.798 | TFLOPs: 22.80 | 31: iteration 251000/ 476837 | consumed samples: 64256000 | consumed tokens: 131596288000 | elapsed time per iteration (s): 0.68 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.521148E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.106 | TFLOPs: 22.81 | 31: iteration 251100/ 476837 | consumed samples: 64281600 | consumed tokens: 131648716800 | elapsed time per iteration (s): 0.78 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.524777E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 330.308 | TFLOPs: 19.98 | 31: iteration 251200/ 476837 | consumed samples: 64307200 | consumed tokens: 131701145600 | elapsed time per iteration (s): 0.68 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.525485E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.709 | TFLOPs: 22.79 | 31: iteration 251300/ 476837 | consumed samples: 64332800 | consumed tokens: 131753574400 | elapsed time per iteration (s): 0.68 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.528179E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.697 | TFLOPs: 22.79 | 31: iteration 251400/ 476837 | consumed samples: 64358400 | consumed tokens: 131806003200 | elapsed time per iteration (s): 0.68 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.526254E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.027 | TFLOPs: 22.81 | 31: iteration 251500/ 476837 | consumed samples: 64384000 | consumed tokens: 131858432000 | elapsed time per iteration (s): 0.68 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.527441E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.024 | TFLOPs: 22.81 | 31: iteration 251600/ 476837 | consumed samples: 64409600 | consumed tokens: 131910860800 | elapsed time per iteration (s): 0.68 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.521386E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.619 | TFLOPs: 22.78 | 31: iteration 251700/ 476837 | consumed samples: 64435200 | consumed tokens: 131963289600 | elapsed time per iteration (s): 0.68 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.532025E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.608 | TFLOPs: 22.78 | 31: iteration 251800/ 476837 | consumed samples: 64460800 | consumed tokens: 132015718400 | elapsed time per iteration (s): 0.68 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.522245E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.103 | TFLOPs: 22.75 | 31: iteration 251900/ 476837 | consumed samples: 64486400 | consumed tokens: 132068147200 | elapsed time per iteration (s): 0.68 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.524901E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.737 | TFLOPs: 22.79 | 0: [2023-04-27 21:30:54,279] [INFO] [logging.py:68:log_dist] [Rank 0] step=252000, skipped=0, lr=[0.00010329967506168246, 0.00010329967506168246, 0.00010329967506168246], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 252000/ 476837 | consumed samples: 64512000 | consumed tokens: 132120576000 | elapsed time per iteration (s): 0.68 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.523505E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.905 | TFLOPs: 22.68 | 0: steps: 252000 loss: 2.5214 iter time (s): 0.681 samples/sec: 375.800 31: iteration 252100/ 476837 | consumed samples: 64537600 | consumed tokens: 132173004800 | elapsed time per iteration (s): 0.68 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.525206E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.829 | TFLOPs: 22.80 | 31: iteration 252200/ 476837 | consumed samples: 64563200 | consumed tokens: 132225433600 | elapsed time per iteration (s): 0.68 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.528767E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.718 | TFLOPs: 22.73 | 31: iteration 252300/ 476837 | consumed samples: 64588800 | consumed tokens: 132277862400 | elapsed time per iteration (s): 0.68 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.520774E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.647 | TFLOPs: 22.79 | 31: iteration 252400/ 476837 | consumed samples: 64614400 | consumed tokens: 132330291200 | elapsed time per iteration (s): 0.68 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.522811E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.690 | TFLOPs: 22.79 | 31: iteration 252500/ 476837 | consumed samples: 64640000 | consumed tokens: 132382720000 | elapsed time per iteration (s): 0.68 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.525161E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.241 | TFLOPs: 22.76 | 31: iteration 252600/ 476837 | consumed samples: 64665600 | consumed tokens: 132435148800 | elapsed time per iteration (s): 0.68 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.525470E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.387 | TFLOPs: 22.77 | 31: iteration 252700/ 476837 | consumed samples: 64691200 | consumed tokens: 132487577600 | elapsed time per iteration (s): 0.68 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.529740E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.551 | TFLOPs: 22.78 | 31: iteration 252800/ 476837 | consumed samples: 64716800 | consumed tokens: 132540006400 | elapsed time per iteration (s): 0.69 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.524762E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.869 | TFLOPs: 22.56 | 31: iteration 252900/ 476837 | consumed samples: 64742400 | consumed tokens: 132592435200 | elapsed time per iteration (s): 0.69 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.530523E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.494 | TFLOPs: 22.53 | 31: iteration 253000/ 476837 | consumed samples: 64768000 | consumed tokens: 132644864000 | elapsed time per iteration (s): 0.68 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.520480E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.850 | TFLOPs: 22.68 | 31: iteration 253100/ 476837 | consumed samples: 64793600 | consumed tokens: 132697292800 | elapsed time per iteration (s): 0.69 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.525630E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.219 | TFLOPs: 22.58 | 31: iteration 253200/ 476837 | consumed samples: 64819200 | consumed tokens: 132749721600 | elapsed time per iteration (s): 0.68 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.524048E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.608 | TFLOPs: 22.66 | 31: iteration 253300/ 476837 | consumed samples: 64844800 | consumed tokens: 132802150400 | elapsed time per iteration (s): 0.68 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.524063E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.168 | TFLOPs: 22.70 | 31: iteration 253400/ 476837 | consumed samples: 64870400 | consumed tokens: 132854579200 | elapsed time per iteration (s): 0.68 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.529246E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.906 | TFLOPs: 22.80 | 31: iteration 253500/ 476837 | consumed samples: 64896000 | consumed tokens: 132907008000 | elapsed time per iteration (s): 0.77 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.525017E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 333.615 | TFLOPs: 20.18 | 31: iteration 253600/ 476837 | consumed samples: 64921600 | consumed tokens: 132959436800 | elapsed time per iteration (s): 0.69 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.527188E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.212 | TFLOPs: 22.40 | 31: iteration 253700/ 476837 | consumed samples: 64947200 | consumed tokens: 133011865600 | elapsed time per iteration (s): 0.68 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.521807E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.742 | TFLOPs: 22.67 | 31: iteration 253800/ 476837 | consumed samples: 64972800 | consumed tokens: 133064294400 | elapsed time per iteration (s): 0.68 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.522318E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.159 | TFLOPs: 22.76 | 31: iteration 253900/ 476837 | consumed samples: 64998400 | consumed tokens: 133116723200 | elapsed time per iteration (s): 0.68 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.526958E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.547 | TFLOPs: 22.78 | 0: [2023-04-27 21:53:47,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=254000, skipped=0, lr=[0.00010210573767415922, 0.00010210573767415922, 0.00010210573767415922], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 254000/ 476837 | consumed samples: 65024000 | consumed tokens: 133169152000 | elapsed time per iteration (s): 0.69 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.520914E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.360 | TFLOPs: 22.59 | 0: steps: 254000 loss: 2.5765 iter time (s): 0.684 samples/sec: 374.513 31: iteration 254100/ 476837 | consumed samples: 65049600 | consumed tokens: 133221580800 | elapsed time per iteration (s): 0.68 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.524396E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.011 | TFLOPs: 22.69 | 31: iteration 254200/ 476837 | consumed samples: 65075200 | consumed tokens: 133274009600 | elapsed time per iteration (s): 0.68 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.523328E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.065 | TFLOPs: 22.75 | 31: iteration 254300/ 476837 | consumed samples: 65100800 | consumed tokens: 133326438400 | elapsed time per iteration (s): 0.68 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.522341E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.117 | TFLOPs: 22.69 | 31: iteration 254400/ 476837 | consumed samples: 65126400 | consumed tokens: 133378867200 | elapsed time per iteration (s): 0.69 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.526022E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.099 | TFLOPs: 22.51 | 31: iteration 254500/ 476837 | consumed samples: 65152000 | consumed tokens: 133431296000 | elapsed time per iteration (s): 0.68 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.526899E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.548 | TFLOPs: 22.78 | 31: iteration 254600/ 476837 | consumed samples: 65177600 | consumed tokens: 133483724800 | elapsed time per iteration (s): 0.68 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.519013E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.684 | TFLOPs: 22.79 | 31: iteration 254700/ 476837 | consumed samples: 65203200 | consumed tokens: 133536153600 | elapsed time per iteration (s): 0.68 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.520177E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.653 | TFLOPs: 22.73 | 31: iteration 254800/ 476837 | consumed samples: 65228800 | consumed tokens: 133588582400 | elapsed time per iteration (s): 0.68 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.520967E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.526 | TFLOPs: 22.72 | 31: iteration 254900/ 476837 | consumed samples: 65254400 | consumed tokens: 133641011200 | elapsed time per iteration (s): 0.68 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.523155E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.450 | TFLOPs: 22.77 | 31: iteration 255000/ 476837 | consumed samples: 65280000 | consumed tokens: 133693440000 | elapsed time per iteration (s): 0.68 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.527404E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.655 | TFLOPs: 22.79 | 31: iteration 255100/ 476837 | consumed samples: 65305600 | consumed tokens: 133745868800 | elapsed time per iteration (s): 0.68 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.524264E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.361 | TFLOPs: 22.71 | 31: iteration 255200/ 476837 | consumed samples: 65331200 | consumed tokens: 133798297600 | elapsed time per iteration (s): 0.69 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.524160E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.833 | TFLOPs: 22.56 | 31: iteration 255300/ 476837 | consumed samples: 65356800 | consumed tokens: 133850726400 | elapsed time per iteration (s): 0.68 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.520670E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.387 | TFLOPs: 22.77 | 31: iteration 255400/ 476837 | consumed samples: 65382400 | consumed tokens: 133903155200 | elapsed time per iteration (s): 0.68 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.518903E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.687 | TFLOPs: 22.79 | 31: iteration 255500/ 476837 | consumed samples: 65408000 | consumed tokens: 133955584000 | elapsed time per iteration (s): 0.68 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.517629E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.755 | TFLOPs: 22.79 | 31: iteration 255600/ 476837 | consumed samples: 65433600 | consumed tokens: 134008012800 | elapsed time per iteration (s): 0.68 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.524870E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.679 | TFLOPs: 22.79 | 31: iteration 255700/ 476837 | consumed samples: 65459200 | consumed tokens: 134060441600 | elapsed time per iteration (s): 0.68 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.523752E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.149 | TFLOPs: 22.76 | 31: iteration 255800/ 476837 | consumed samples: 65484800 | consumed tokens: 134112870400 | elapsed time per iteration (s): 0.68 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.525523E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.748 | TFLOPs: 22.79 | 31: iteration 255900/ 476837 | consumed samples: 65510400 | consumed tokens: 134165299200 | elapsed time per iteration (s): 0.68 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.523409E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.501 | TFLOPs: 22.72 | 0: [2023-04-27 22:16:41,051] [INFO] [logging.py:68:log_dist] [Rank 0] step=256000, skipped=0, lr=[0.00010091319876033617, 0.00010091319876033617, 0.00010091319876033617], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 256000/ 476837 | consumed samples: 65536000 | consumed tokens: 134217728000 | elapsed time per iteration (s): 0.79 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.518744E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 325.697 | TFLOPs: 19.70 | 0: steps: 256000 loss: 2.5404 iter time (s): 0.683 samples/sec: 374.725 31: iteration 256100/ 476837 | consumed samples: 65561600 | consumed tokens: 134270156800 | elapsed time per iteration (s): 0.68 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.526162E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.475 | TFLOPs: 22.65 | 31: iteration 256200/ 476837 | consumed samples: 65587200 | consumed tokens: 134322585600 | elapsed time per iteration (s): 0.68 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.523179E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.546 | TFLOPs: 22.78 | 31: iteration 256300/ 476837 | consumed samples: 65612800 | consumed tokens: 134375014400 | elapsed time per iteration (s): 0.68 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.522949E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.157 | TFLOPs: 22.76 | 31: iteration 256400/ 476837 | consumed samples: 65638400 | consumed tokens: 134427443200 | elapsed time per iteration (s): 0.68 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.523435E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.616 | TFLOPs: 22.78 | 31: iteration 256500/ 476837 | consumed samples: 65664000 | consumed tokens: 134479872000 | elapsed time per iteration (s): 0.68 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.521137E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.523 | TFLOPs: 22.78 | 31: iteration 256600/ 476837 | consumed samples: 65689600 | consumed tokens: 134532300800 | elapsed time per iteration (s): 0.68 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.528457E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.682 | TFLOPs: 22.79 | 31: iteration 256700/ 476837 | consumed samples: 65715200 | consumed tokens: 134584729600 | elapsed time per iteration (s): 0.68 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.522733E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.593 | TFLOPs: 22.78 | 31: iteration 256800/ 476837 | consumed samples: 65740800 | consumed tokens: 134637158400 | elapsed time per iteration (s): 0.68 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.521631E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.312 | TFLOPs: 22.64 | 31: iteration 256900/ 476837 | consumed samples: 65766400 | consumed tokens: 134689587200 | elapsed time per iteration (s): 0.68 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.516904E+00 | grad norm: 0.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.506 | TFLOPs: 22.78 | 31: iteration 257000/ 476837 | consumed samples: 65792000 | consumed tokens: 134742016000 | elapsed time per iteration (s): 0.68 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.516748E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.603 | TFLOPs: 22.78 | 31: iteration 257100/ 476837 | consumed samples: 65817600 | consumed tokens: 134794444800 | elapsed time per iteration (s): 0.68 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.523186E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.626 | TFLOPs: 22.78 | 31: iteration 257200/ 476837 | consumed samples: 65843200 | consumed tokens: 134846873600 | elapsed time per iteration (s): 0.68 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.524030E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.546 | TFLOPs: 22.78 | 31: iteration 257300/ 476837 | consumed samples: 65868800 | consumed tokens: 134899302400 | elapsed time per iteration (s): 0.68 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.522948E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.449 | TFLOPs: 22.77 | 31: iteration 257400/ 476837 | consumed samples: 65894400 | consumed tokens: 134951731200 | elapsed time per iteration (s): 0.68 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.521195E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.392 | TFLOPs: 22.77 | 31: iteration 257500/ 476837 | consumed samples: 65920000 | consumed tokens: 135004160000 | elapsed time per iteration (s): 0.68 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.521522E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.035 | TFLOPs: 22.75 | 31: iteration 257600/ 476837 | consumed samples: 65945600 | consumed tokens: 135056588800 | elapsed time per iteration (s): 0.68 | learning rate: 9.996E-05 | global batch size: 256 | lm loss: 2.517098E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.506 | TFLOPs: 22.78 | 31: iteration 257700/ 476837 | consumed samples: 65971200 | consumed tokens: 135109017600 | elapsed time per iteration (s): 0.68 | learning rate: 9.990E-05 | global batch size: 256 | lm loss: 2.521201E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.890 | TFLOPs: 22.62 | 31: iteration 257800/ 476837 | consumed samples: 65996800 | consumed tokens: 135161446400 | elapsed time per iteration (s): 0.68 | learning rate: 9.984E-05 | global batch size: 256 | lm loss: 2.521845E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.326 | TFLOPs: 22.71 | 31: iteration 257900/ 476837 | consumed samples: 66022400 | consumed tokens: 135213875200 | elapsed time per iteration (s): 0.68 | learning rate: 9.978E-05 | global batch size: 256 | lm loss: 2.520218E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.687 | TFLOPs: 22.73 | 0: [2023-04-27 22:39:22,580] [INFO] [logging.py:68:log_dist] [Rank 0] step=258000, skipped=0, lr=[9.972226957925664e-05, 9.972226957925664e-05, 9.972226957925664e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 258000/ 476837 | consumed samples: 66048000 | consumed tokens: 135266304000 | elapsed time per iteration (s): 0.68 | learning rate: 9.972E-05 | global batch size: 256 | lm loss: 2.522096E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.507 | TFLOPs: 22.78 | 0: steps: 258000 loss: 2.5496 iter time (s): 0.677 samples/sec: 377.912 31: iteration 258100/ 476837 | consumed samples: 66073600 | consumed tokens: 135318732800 | elapsed time per iteration (s): 0.68 | learning rate: 9.966E-05 | global batch size: 256 | lm loss: 2.515504E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.635 | TFLOPs: 22.79 | 31: iteration 258200/ 476837 | consumed samples: 66099200 | consumed tokens: 135371161600 | elapsed time per iteration (s): 0.68 | learning rate: 9.960E-05 | global batch size: 256 | lm loss: 2.520189E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.826 | TFLOPs: 22.80 | 31: iteration 258300/ 476837 | consumed samples: 66124800 | consumed tokens: 135423590400 | elapsed time per iteration (s): 0.68 | learning rate: 9.954E-05 | global batch size: 256 | lm loss: 2.519586E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.519 | TFLOPs: 22.78 | 31: iteration 258400/ 476837 | consumed samples: 66150400 | consumed tokens: 135476019200 | elapsed time per iteration (s): 0.71 | learning rate: 9.948E-05 | global batch size: 256 | lm loss: 2.521935E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.590 | TFLOPs: 21.94 | 31: iteration 258500/ 476837 | consumed samples: 66176000 | consumed tokens: 135528448000 | elapsed time per iteration (s): 0.76 | learning rate: 9.942E-05 | global batch size: 256 | lm loss: 2.520482E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 335.534 | TFLOPs: 20.30 | 31: iteration 258600/ 476837 | consumed samples: 66201600 | consumed tokens: 135580876800 | elapsed time per iteration (s): 0.69 | learning rate: 9.937E-05 | global batch size: 256 | lm loss: 2.519791E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.948 | TFLOPs: 22.56 | 31: iteration 258700/ 476837 | consumed samples: 66227200 | consumed tokens: 135633305600 | elapsed time per iteration (s): 0.69 | learning rate: 9.931E-05 | global batch size: 256 | lm loss: 2.518470E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.001 | TFLOPs: 22.51 | 31: iteration 258800/ 476837 | consumed samples: 66252800 | consumed tokens: 135685734400 | elapsed time per iteration (s): 0.69 | learning rate: 9.925E-05 | global batch size: 256 | lm loss: 2.518049E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.484 | TFLOPs: 22.47 | 31: iteration 258900/ 476837 | consumed samples: 66278400 | consumed tokens: 135738163200 | elapsed time per iteration (s): 0.68 | learning rate: 9.919E-05 | global batch size: 256 | lm loss: 2.514922E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.118 | TFLOPs: 22.63 | 31: iteration 259000/ 476837 | consumed samples: 66304000 | consumed tokens: 135790592000 | elapsed time per iteration (s): 0.68 | learning rate: 9.913E-05 | global batch size: 256 | lm loss: 2.516908E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.223 | TFLOPs: 22.76 | 31: iteration 259100/ 476837 | consumed samples: 66329600 | consumed tokens: 135843020800 | elapsed time per iteration (s): 0.68 | learning rate: 9.907E-05 | global batch size: 256 | lm loss: 2.520258E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.624 | TFLOPs: 22.78 | 31: iteration 259200/ 476837 | consumed samples: 66355200 | consumed tokens: 135895449600 | elapsed time per iteration (s): 0.68 | learning rate: 9.901E-05 | global batch size: 256 | lm loss: 2.519017E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.736 | TFLOPs: 22.73 | 31: iteration 259300/ 476837 | consumed samples: 66380800 | consumed tokens: 135947878400 | elapsed time per iteration (s): 0.68 | learning rate: 9.895E-05 | global batch size: 256 | lm loss: 2.518941E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.375 | TFLOPs: 22.65 | 31: iteration 259400/ 476837 | consumed samples: 66406400 | consumed tokens: 136000307200 | elapsed time per iteration (s): 0.69 | learning rate: 9.889E-05 | global batch size: 256 | lm loss: 2.518673E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.876 | TFLOPs: 22.44 | 31: iteration 259500/ 476837 | consumed samples: 66432000 | consumed tokens: 136052736000 | elapsed time per iteration (s): 0.68 | learning rate: 9.883E-05 | global batch size: 256 | lm loss: 2.515790E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.742 | TFLOPs: 22.79 | 31: iteration 259600/ 476837 | consumed samples: 66457600 | consumed tokens: 136105164800 | elapsed time per iteration (s): 0.68 | learning rate: 9.877E-05 | global batch size: 256 | lm loss: 2.514982E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.303 | TFLOPs: 22.77 | 31: iteration 259700/ 476837 | consumed samples: 66483200 | consumed tokens: 136157593600 | elapsed time per iteration (s): 0.69 | learning rate: 9.871E-05 | global batch size: 256 | lm loss: 2.514667E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 368.443 | TFLOPs: 22.29 | 31: iteration 259800/ 476837 | consumed samples: 66508800 | consumed tokens: 136210022400 | elapsed time per iteration (s): 0.68 | learning rate: 9.865E-05 | global batch size: 256 | lm loss: 2.523904E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.777 | TFLOPs: 22.67 | 31: iteration 259900/ 476837 | consumed samples: 66534400 | consumed tokens: 136262451200 | elapsed time per iteration (s): 0.68 | learning rate: 9.859E-05 | global batch size: 256 | lm loss: 2.522375E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.030 | TFLOPs: 22.75 | 0: [2023-04-27 23:02:19,913] [INFO] [logging.py:68:log_dist] [Rank 0] step=260000, skipped=0, lr=[9.853316110479887e-05, 9.853316110479887e-05, 9.853316110479887e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 260000/ 476837 | consumed samples: 66560000 | consumed tokens: 136314880000 | elapsed time per iteration (s): 0.68 | learning rate: 9.853E-05 | global batch size: 256 | lm loss: 2.520383E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.650 | TFLOPs: 22.67 | 0: steps: 260000 loss: 2.5581 iter time (s): 0.686 samples/sec: 373.447 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 260000 | lm loss value: 2.911928E+00 | lm loss PPL: 1.839222E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 260000 to checkpoints_1b1250b1b5 0: [2023-04-27 23:02:20,315] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step260000 is begin to save! 0: [2023-04-27 23:02:20,358] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_01-model_00-model_states.pt... 0: [2023-04-27 23:02:20,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_01-model_00-model_states.pt. 0: [2023-04-27 23:02:20,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_03-model_00-model_states.pt... 0: [2023-04-27 23:02:20,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_03-model_00-model_states.pt. 0: [2023-04-27 23:02:20,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_04-model_00-model_states.pt... 0: [2023-04-27 23:02:21,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_04-model_00-model_states.pt. 0: [2023-04-27 23:02:21,019] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_05-model_00-model_states.pt... 0: [2023-04-27 23:02:21,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_05-model_00-model_states.pt. 0: [2023-04-27 23:02:21,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_06-model_00-model_states.pt... 0: [2023-04-27 23:02:21,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_06-model_00-model_states.pt. 0: [2023-04-27 23:02:21,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_07-model_00-model_states.pt... 0: [2023-04-27 23:02:21,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_07-model_00-model_states.pt. 0: [2023-04-27 23:02:21,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_08-model_00-model_states.pt... 0: [2023-04-27 23:02:21,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_08-model_00-model_states.pt. 0: [2023-04-27 23:02:21,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_09-model_00-model_states.pt... 0: [2023-04-27 23:02:21,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_09-model_00-model_states.pt. 0: [2023-04-27 23:02:21,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_10-model_00-model_states.pt... 0: [2023-04-27 23:02:21,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_10-model_00-model_states.pt. 0: [2023-04-27 23:02:21,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_11-model_00-model_states.pt... 0: [2023-04-27 23:02:21,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_11-model_00-model_states.pt. 0: [2023-04-27 23:02:21,643] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_12-model_00-model_states.pt... 0: [2023-04-27 23:02:21,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_12-model_00-model_states.pt. 0: [2023-04-27 23:02:21,728] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_13-model_00-model_states.pt... 0: [2023-04-27 23:02:21,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_13-model_00-model_states.pt. 0: [2023-04-27 23:02:21,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_14-model_00-model_states.pt... 0: [2023-04-27 23:02:21,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_14-model_00-model_states.pt. 0: [2023-04-27 23:02:21,909] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_15-model_00-model_states.pt... 0: [2023-04-27 23:02:21,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_15-model_00-model_states.pt. 0: [2023-04-27 23:02:21,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_16-model_00-model_states.pt... 0: [2023-04-27 23:02:22,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_16-model_00-model_states.pt. 0: [2023-04-27 23:02:22,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_17-model_00-model_states.pt... 0: [2023-04-27 23:02:22,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_17-model_00-model_states.pt. 0: [2023-04-27 23:02:22,173] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_18-model_00-model_states.pt... 0: [2023-04-27 23:02:22,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_18-model_00-model_states.pt. 0: [2023-04-27 23:02:22,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_19-model_00-model_states.pt... 0: [2023-04-27 23:02:22,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_19-model_00-model_states.pt. 0: [2023-04-27 23:02:22,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_20-model_00-model_states.pt... 0: [2023-04-27 23:02:22,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_20-model_00-model_states.pt. 0: [2023-04-27 23:02:22,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_21-model_00-model_states.pt... 0: [2023-04-27 23:02:22,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_21-model_00-model_states.pt. 0: [2023-04-27 23:02:22,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_22-model_00-model_states.pt... 0: [2023-04-27 23:02:22,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_22-model_00-model_states.pt. 0: [2023-04-27 23:02:22,608] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_23-model_00-model_states.pt... 0: [2023-04-27 23:02:22,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_23-model_00-model_states.pt. 0: [2023-04-27 23:02:22,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_24-model_00-model_states.pt... 0: [2023-04-27 23:02:22,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_24-model_00-model_states.pt. 0: [2023-04-27 23:02:22,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_25-model_00-model_states.pt... 0: [2023-04-27 23:02:22,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_25-model_00-model_states.pt. 0: [2023-04-27 23:02:22,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_26-model_00-model_states.pt... 0: [2023-04-27 23:02:22,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_26-model_00-model_states.pt. 0: [2023-04-27 23:02:22,942] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_27-model_00-model_states.pt... 0: [2023-04-27 23:02:23,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_27-model_00-model_states.pt. 0: [2023-04-27 23:02:23,028] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_28-model_00-model_states.pt... 0: [2023-04-27 23:02:23,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_28-model_00-model_states.pt. 0: [2023-04-27 23:02:23,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/layer_30-model_00-model_states.pt... 0: [2023-04-27 23:02:23,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/layer_30-model_00-model_states.pt. 0: [2023-04-27 23:02:23,118] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step260000/mp_rank_00_model_states.pt 0: [2023-04-27 23:02:23,118] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/mp_rank_00_model_states.pt... 0: [2023-04-27 23:02:23,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/mp_rank_00_model_states.pt. 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 26: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 16: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 11: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 10: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 19: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 24: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 21: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 29: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 8: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 9: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 15: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 28: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 6: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 13: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-04-27 23:02:23,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: [2023-04-27 23:02:23,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: [2023-04-27 23:02:23,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-04-27 23:02:23,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: [2023-04-27 23:02:23,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: [2023-04-27 23:02:23,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-27 23:02:23,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-27 23:02:23,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: [2023-04-27 23:02:23,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 1: [2023-04-27 23:02:23,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 17: [2023-04-27 23:02:23,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 13: [2023-04-27 23:02:23,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-27 23:02:23,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 1: [2023-04-27 23:02:23,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 13: [2023-04-27 23:02:23,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-27 23:02:23,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 12: [2023-04-27 23:02:23,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-27 23:02:23,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-27 23:02:23,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 16: [2023-04-27 23:02:23,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 10: [2023-04-27 23:02:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-27 23:02:23,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-27 23:02:23,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-27 23:02:23,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 3: [2023-04-27 23:02:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-27 23:02:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-04-27 23:02:23,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-27 23:02:23,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 26: [2023-04-27 23:02:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-27 23:02:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-27 23:02:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-27 23:02:23,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 5: [2023-04-27 23:02:23,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-27 23:02:23,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-27 23:02:23,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 19: [2023-04-27 23:02:23,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 7: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 7: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 27: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 2: [2023-04-27 23:02:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-27 23:02:23,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 29: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 11: [2023-04-27 23:02:23,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 23: [2023-04-27 23:02:23,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 31: [2023-04-27 23:02:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 20: [2023-04-27 23:02:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 9: [2023-04-27 23:02:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 14: [2023-04-27 23:02:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 24: [2023-04-27 23:02:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 4: [2023-04-27 23:02:23,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 1: [2023-04-27 23:02:23,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-27 23:02:23,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-27 23:02:23,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 18: [2023-04-27 23:02:23,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 25: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 15: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 8: [2023-04-27 23:02:23,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 28: [2023-04-27 23:02:23,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-27 23:02:23,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-27 23:02:23,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 22: [2023-04-27 23:02:23,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-27 23:02:23,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 30: [2023-04-27 23:02:23,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-27 23:02:23,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 21: [2023-04-27 23:02:23,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 6: [2023-04-27 23:02:23,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-27 23:02:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step260000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-27 23:02:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step260000 is ready now! 0: successfully saved checkpoint at iteration 260000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 3454.68 31: iteration 260100/ 476837 | consumed samples: 66585600 | consumed tokens: 136367308800 | elapsed time per iteration (s): 0.73 | learning rate: 9.847E-05 | global batch size: 256 | lm loss: 2.516741E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 353.031 | TFLOPs: 21.36 | 31: iteration 260200/ 476837 | consumed samples: 66611200 | consumed tokens: 136419737600 | elapsed time per iteration (s): 0.68 | learning rate: 9.841E-05 | global batch size: 256 | lm loss: 2.515451E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.464 | TFLOPs: 22.78 | 31: iteration 260300/ 476837 | consumed samples: 66636800 | consumed tokens: 136472166400 | elapsed time per iteration (s): 0.68 | learning rate: 9.835E-05 | global batch size: 256 | lm loss: 2.521016E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.704 | TFLOPs: 22.79 | 31: iteration 260400/ 476837 | consumed samples: 66662400 | consumed tokens: 136524595200 | elapsed time per iteration (s): 0.68 | learning rate: 9.830E-05 | global batch size: 256 | lm loss: 2.519572E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.289 | TFLOPs: 22.76 | 31: iteration 260500/ 476837 | consumed samples: 66688000 | consumed tokens: 136577024000 | elapsed time per iteration (s): 0.68 | learning rate: 9.824E-05 | global batch size: 256 | lm loss: 2.521202E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.731 | TFLOPs: 22.67 | 31: iteration 260600/ 476837 | consumed samples: 66713600 | consumed tokens: 136629452800 | elapsed time per iteration (s): 0.68 | learning rate: 9.818E-05 | global batch size: 256 | lm loss: 2.519138E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.940 | TFLOPs: 22.80 | 31: iteration 260700/ 476837 | consumed samples: 66739200 | consumed tokens: 136681881600 | elapsed time per iteration (s): 0.68 | learning rate: 9.812E-05 | global batch size: 256 | lm loss: 2.515408E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.995 | TFLOPs: 22.81 | 31: iteration 260800/ 476837 | consumed samples: 66764800 | consumed tokens: 136734310400 | elapsed time per iteration (s): 0.69 | learning rate: 9.806E-05 | global batch size: 256 | lm loss: 2.517736E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.074 | TFLOPs: 22.57 | 31: iteration 260900/ 476837 | consumed samples: 66790400 | consumed tokens: 136786739200 | elapsed time per iteration (s): 0.72 | learning rate: 9.800E-05 | global batch size: 256 | lm loss: 2.515565E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.689 | TFLOPs: 21.64 | 31: iteration 261000/ 476837 | consumed samples: 66816000 | consumed tokens: 136839168000 | elapsed time per iteration (s): 0.77 | learning rate: 9.794E-05 | global batch size: 256 | lm loss: 2.514318E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 333.704 | TFLOPs: 20.19 | 31: iteration 261100/ 476837 | consumed samples: 66841600 | consumed tokens: 136891596800 | elapsed time per iteration (s): 0.68 | learning rate: 9.788E-05 | global batch size: 256 | lm loss: 2.512973E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.866 | TFLOPs: 22.80 | 31: iteration 261200/ 476837 | consumed samples: 66867200 | consumed tokens: 136944025600 | elapsed time per iteration (s): 0.68 | learning rate: 9.782E-05 | global batch size: 256 | lm loss: 2.519663E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.943 | TFLOPs: 22.80 | 31: iteration 261300/ 476837 | consumed samples: 66892800 | consumed tokens: 136996454400 | elapsed time per iteration (s): 0.68 | learning rate: 9.776E-05 | global batch size: 256 | lm loss: 2.517901E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.036 | TFLOPs: 22.81 | 31: iteration 261400/ 476837 | consumed samples: 66918400 | consumed tokens: 137048883200 | elapsed time per iteration (s): 0.68 | learning rate: 9.770E-05 | global batch size: 256 | lm loss: 2.516529E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.738 | TFLOPs: 22.79 | 31: iteration 261500/ 476837 | consumed samples: 66944000 | consumed tokens: 137101312000 | elapsed time per iteration (s): 0.68 | learning rate: 9.764E-05 | global batch size: 256 | lm loss: 2.519419E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.117 | TFLOPs: 22.81 | 31: iteration 261600/ 476837 | consumed samples: 66969600 | consumed tokens: 137153740800 | elapsed time per iteration (s): 0.70 | learning rate: 9.758E-05 | global batch size: 256 | lm loss: 2.520038E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.306 | TFLOPs: 22.16 | 31: iteration 261700/ 476837 | consumed samples: 66995200 | consumed tokens: 137206169600 | elapsed time per iteration (s): 0.69 | learning rate: 9.752E-05 | global batch size: 256 | lm loss: 2.520505E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.807 | TFLOPs: 22.55 | 31: iteration 261800/ 476837 | consumed samples: 67020800 | consumed tokens: 137258598400 | elapsed time per iteration (s): 0.68 | learning rate: 9.746E-05 | global batch size: 256 | lm loss: 2.520446E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.027 | TFLOPs: 22.75 | 31: iteration 261900/ 476837 | consumed samples: 67046400 | consumed tokens: 137311027200 | elapsed time per iteration (s): 0.68 | learning rate: 9.741E-05 | global batch size: 256 | lm loss: 2.516145E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.795 | TFLOPs: 22.80 | 0: [2023-04-27 23:25:19,566] [INFO] [logging.py:68:log_dist] [Rank 0] step=262000, skipped=0, lr=[9.734608398830172e-05, 9.734608398830172e-05, 9.734608398830172e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 262000/ 476837 | consumed samples: 67072000 | consumed tokens: 137363456000 | elapsed time per iteration (s): 0.68 | learning rate: 9.735E-05 | global batch size: 256 | lm loss: 2.514475E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.741 | TFLOPs: 22.79 | 0: steps: 262000 loss: 2.5193 iter time (s): 0.686 samples/sec: 373.384 31: iteration 262100/ 476837 | consumed samples: 67097600 | consumed tokens: 137415884800 | elapsed time per iteration (s): 0.68 | learning rate: 9.729E-05 | global batch size: 256 | lm loss: 2.518012E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.885 | TFLOPs: 22.80 | 31: iteration 262200/ 476837 | consumed samples: 67123200 | consumed tokens: 137468313600 | elapsed time per iteration (s): 0.68 | learning rate: 9.723E-05 | global batch size: 256 | lm loss: 2.516512E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.823 | TFLOPs: 22.80 | 31: iteration 262300/ 476837 | consumed samples: 67148800 | consumed tokens: 137520742400 | elapsed time per iteration (s): 0.68 | learning rate: 9.717E-05 | global batch size: 256 | lm loss: 2.519331E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.777 | TFLOPs: 22.79 | 31: iteration 262400/ 476837 | consumed samples: 67174400 | consumed tokens: 137573171200 | elapsed time per iteration (s): 0.68 | learning rate: 9.711E-05 | global batch size: 256 | lm loss: 2.516909E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.921 | TFLOPs: 22.80 | 31: iteration 262500/ 476837 | consumed samples: 67200000 | consumed tokens: 137625600000 | elapsed time per iteration (s): 0.69 | learning rate: 9.705E-05 | global batch size: 256 | lm loss: 2.513593E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.151 | TFLOPs: 22.51 | 31: iteration 262600/ 476837 | consumed samples: 67225600 | consumed tokens: 137678028800 | elapsed time per iteration (s): 0.68 | learning rate: 9.699E-05 | global batch size: 256 | lm loss: 2.516594E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.728 | TFLOPs: 22.79 | 31: iteration 262700/ 476837 | consumed samples: 67251200 | consumed tokens: 137730457600 | elapsed time per iteration (s): 0.68 | learning rate: 9.693E-05 | global batch size: 256 | lm loss: 2.519902E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.001 | TFLOPs: 22.81 | 31: iteration 262800/ 476837 | consumed samples: 67276800 | consumed tokens: 137782886400 | elapsed time per iteration (s): 0.68 | learning rate: 9.687E-05 | global batch size: 256 | lm loss: 2.516276E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.927 | TFLOPs: 22.80 | 31: iteration 262900/ 476837 | consumed samples: 67302400 | consumed tokens: 137835315200 | elapsed time per iteration (s): 0.68 | learning rate: 9.681E-05 | global batch size: 256 | lm loss: 2.516607E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.979 | TFLOPs: 22.81 | 31: iteration 263000/ 476837 | consumed samples: 67328000 | consumed tokens: 137887744000 | elapsed time per iteration (s): 0.68 | learning rate: 9.675E-05 | global batch size: 256 | lm loss: 2.513029E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.628 | TFLOPs: 22.79 | 31: iteration 263100/ 476837 | consumed samples: 67353600 | consumed tokens: 137940172800 | elapsed time per iteration (s): 0.68 | learning rate: 9.669E-05 | global batch size: 256 | lm loss: 2.517974E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.022 | TFLOPs: 22.75 | 31: iteration 263200/ 476837 | consumed samples: 67379200 | consumed tokens: 137992601600 | elapsed time per iteration (s): 0.68 | learning rate: 9.663E-05 | global batch size: 256 | lm loss: 2.515802E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.846 | TFLOPs: 22.80 | 31: iteration 263300/ 476837 | consumed samples: 67404800 | consumed tokens: 138045030400 | elapsed time per iteration (s): 0.68 | learning rate: 9.658E-05 | global batch size: 256 | lm loss: 2.518815E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.971 | TFLOPs: 22.81 | 31: iteration 263400/ 476837 | consumed samples: 67430400 | consumed tokens: 138097459200 | elapsed time per iteration (s): 0.68 | learning rate: 9.652E-05 | global batch size: 256 | lm loss: 2.512723E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.532 | TFLOPs: 22.78 | 31: iteration 263500/ 476837 | consumed samples: 67456000 | consumed tokens: 138149888000 | elapsed time per iteration (s): 0.79 | learning rate: 9.646E-05 | global batch size: 256 | lm loss: 2.513770E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 323.984 | TFLOPs: 19.60 | 31: iteration 263600/ 476837 | consumed samples: 67481600 | consumed tokens: 138202316800 | elapsed time per iteration (s): 0.68 | learning rate: 9.640E-05 | global batch size: 256 | lm loss: 2.517136E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.168 | TFLOPs: 22.76 | 31: iteration 263700/ 476837 | consumed samples: 67507200 | consumed tokens: 138254745600 | elapsed time per iteration (s): 0.68 | learning rate: 9.634E-05 | global batch size: 256 | lm loss: 2.515340E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.687 | TFLOPs: 22.79 | 31: iteration 263800/ 476837 | consumed samples: 67532800 | consumed tokens: 138307174400 | elapsed time per iteration (s): 0.68 | learning rate: 9.628E-05 | global batch size: 256 | lm loss: 2.515488E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.801 | TFLOPs: 22.80 | 31: iteration 263900/ 476837 | consumed samples: 67558400 | consumed tokens: 138359603200 | elapsed time per iteration (s): 0.68 | learning rate: 9.622E-05 | global batch size: 256 | lm loss: 2.512491E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.533 | TFLOPs: 22.78 | 0: [2023-04-27 23:48:10,734] [INFO] [logging.py:68:log_dist] [Rank 0] step=264000, skipped=0, lr=[9.616124852124767e-05, 9.616124852124767e-05, 9.616124852124767e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 264000/ 476837 | consumed samples: 67584000 | consumed tokens: 138412032000 | elapsed time per iteration (s): 0.68 | learning rate: 9.616E-05 | global batch size: 256 | lm loss: 2.513120E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.780 | TFLOPs: 22.73 | 0: steps: 264000 loss: 2.4828 iter time (s): 0.683 samples/sec: 374.687 31: iteration 264100/ 476837 | consumed samples: 67609600 | consumed tokens: 138464460800 | elapsed time per iteration (s): 0.69 | learning rate: 9.610E-05 | global batch size: 256 | lm loss: 2.513915E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.317 | TFLOPs: 22.46 | 31: iteration 264200/ 476837 | consumed samples: 67635200 | consumed tokens: 138516889600 | elapsed time per iteration (s): 0.68 | learning rate: 9.604E-05 | global batch size: 256 | lm loss: 2.515110E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.154 | TFLOPs: 22.76 | 31: iteration 264300/ 476837 | consumed samples: 67660800 | consumed tokens: 138569318400 | elapsed time per iteration (s): 0.70 | learning rate: 9.598E-05 | global batch size: 256 | lm loss: 2.514126E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.681 | TFLOPs: 22.06 | 31: iteration 264400/ 476837 | consumed samples: 67686400 | consumed tokens: 138621747200 | elapsed time per iteration (s): 0.69 | learning rate: 9.592E-05 | global batch size: 256 | lm loss: 2.514261E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.242 | TFLOPs: 22.58 | 31: iteration 264500/ 476837 | consumed samples: 67712000 | consumed tokens: 138674176000 | elapsed time per iteration (s): 0.68 | learning rate: 9.587E-05 | global batch size: 256 | lm loss: 2.514479E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.246 | TFLOPs: 22.70 | 31: iteration 264600/ 476837 | consumed samples: 67737600 | consumed tokens: 138726604800 | elapsed time per iteration (s): 0.68 | learning rate: 9.581E-05 | global batch size: 256 | lm loss: 2.514654E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.441 | TFLOPs: 22.71 | 31: iteration 264700/ 476837 | consumed samples: 67763200 | consumed tokens: 138779033600 | elapsed time per iteration (s): 0.68 | learning rate: 9.575E-05 | global batch size: 256 | lm loss: 2.511735E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.215 | TFLOPs: 22.82 | 31: iteration 264800/ 476837 | consumed samples: 67788800 | consumed tokens: 138831462400 | elapsed time per iteration (s): 0.68 | learning rate: 9.569E-05 | global batch size: 256 | lm loss: 2.515092E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.081 | TFLOPs: 22.81 | 31: iteration 264900/ 476837 | consumed samples: 67814400 | consumed tokens: 138883891200 | elapsed time per iteration (s): 0.69 | learning rate: 9.563E-05 | global batch size: 256 | lm loss: 2.513670E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.728 | TFLOPs: 22.37 | 31: iteration 265000/ 476837 | consumed samples: 67840000 | consumed tokens: 138936320000 | elapsed time per iteration (s): 0.68 | learning rate: 9.557E-05 | global batch size: 256 | lm loss: 2.514982E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.030 | TFLOPs: 22.75 | 31: iteration 265100/ 476837 | consumed samples: 67865600 | consumed tokens: 138988748800 | elapsed time per iteration (s): 0.68 | learning rate: 9.551E-05 | global batch size: 256 | lm loss: 2.516490E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.970 | TFLOPs: 22.75 | 31: iteration 265200/ 476837 | consumed samples: 67891200 | consumed tokens: 139041177600 | elapsed time per iteration (s): 0.68 | learning rate: 9.545E-05 | global batch size: 256 | lm loss: 2.515125E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.872 | TFLOPs: 22.80 | 31: iteration 265300/ 476837 | consumed samples: 67916800 | consumed tokens: 139093606400 | elapsed time per iteration (s): 0.68 | learning rate: 9.539E-05 | global batch size: 256 | lm loss: 2.513225E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.065 | TFLOPs: 22.81 | 31: iteration 265400/ 476837 | consumed samples: 67942400 | consumed tokens: 139146035200 | elapsed time per iteration (s): 0.68 | learning rate: 9.533E-05 | global batch size: 256 | lm loss: 2.509580E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.122 | TFLOPs: 22.75 | 31: iteration 265500/ 476837 | consumed samples: 67968000 | consumed tokens: 139198464000 | elapsed time per iteration (s): 0.68 | learning rate: 9.527E-05 | global batch size: 256 | lm loss: 2.514204E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.993 | TFLOPs: 22.81 | 31: iteration 265600/ 476837 | consumed samples: 67993600 | consumed tokens: 139250892800 | elapsed time per iteration (s): 0.68 | learning rate: 9.522E-05 | global batch size: 256 | lm loss: 2.518160E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.150 | TFLOPs: 22.76 | 31: iteration 265700/ 476837 | consumed samples: 68019200 | consumed tokens: 139303321600 | elapsed time per iteration (s): 0.68 | learning rate: 9.516E-05 | global batch size: 256 | lm loss: 2.510861E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.998 | TFLOPs: 22.69 | 31: iteration 265800/ 476837 | consumed samples: 68044800 | consumed tokens: 139355750400 | elapsed time per iteration (s): 0.68 | learning rate: 9.510E-05 | global batch size: 256 | lm loss: 2.513065E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.861 | TFLOPs: 22.74 | 31: iteration 265900/ 476837 | consumed samples: 68070400 | consumed tokens: 139408179200 | elapsed time per iteration (s): 0.68 | learning rate: 9.504E-05 | global batch size: 256 | lm loss: 2.511285E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.012 | TFLOPs: 22.81 | 0: [2023-04-28 00:11:02,591] [INFO] [logging.py:68:log_dist] [Rank 0] step=266000, skipped=0, lr=[9.49788645980095e-05, 9.49788645980095e-05, 9.49788645980095e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 266000/ 476837 | consumed samples: 68096000 | consumed tokens: 139460608000 | elapsed time per iteration (s): 0.74 | learning rate: 9.498E-05 | global batch size: 256 | lm loss: 2.511023E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 344.235 | TFLOPs: 20.83 | 0: steps: 266000 loss: 2.5209 iter time (s): 0.683 samples/sec: 374.787 31: iteration 266100/ 476837 | consumed samples: 68121600 | consumed tokens: 139513036800 | elapsed time per iteration (s): 0.75 | learning rate: 9.492E-05 | global batch size: 256 | lm loss: 2.514332E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.306 | TFLOPs: 20.77 | 31: iteration 266200/ 476837 | consumed samples: 68147200 | consumed tokens: 139565465600 | elapsed time per iteration (s): 0.69 | learning rate: 9.486E-05 | global batch size: 256 | lm loss: 2.511945E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.579 | TFLOPs: 22.42 | 31: iteration 266300/ 476837 | consumed samples: 68172800 | consumed tokens: 139617894400 | elapsed time per iteration (s): 0.68 | learning rate: 9.480E-05 | global batch size: 256 | lm loss: 2.515529E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.201 | TFLOPs: 22.82 | 31: iteration 266400/ 476837 | consumed samples: 68198400 | consumed tokens: 139670323200 | elapsed time per iteration (s): 0.68 | learning rate: 9.474E-05 | global batch size: 256 | lm loss: 2.512827E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.816 | TFLOPs: 22.80 | 31: iteration 266500/ 476837 | consumed samples: 68224000 | consumed tokens: 139722752000 | elapsed time per iteration (s): 0.69 | learning rate: 9.468E-05 | global batch size: 256 | lm loss: 2.512442E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.293 | TFLOPs: 22.46 | 31: iteration 266600/ 476837 | consumed samples: 68249600 | consumed tokens: 139775180800 | elapsed time per iteration (s): 0.68 | learning rate: 9.462E-05 | global batch size: 256 | lm loss: 2.514225E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.827 | TFLOPs: 22.74 | 31: iteration 266700/ 476837 | consumed samples: 68275200 | consumed tokens: 139827609600 | elapsed time per iteration (s): 0.68 | learning rate: 9.457E-05 | global batch size: 256 | lm loss: 2.516497E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.222 | TFLOPs: 22.76 | 31: iteration 266800/ 476837 | consumed samples: 68300800 | consumed tokens: 139880038400 | elapsed time per iteration (s): 0.68 | learning rate: 9.451E-05 | global batch size: 256 | lm loss: 2.515084E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.074 | TFLOPs: 22.81 | 31: iteration 266900/ 476837 | consumed samples: 68326400 | consumed tokens: 139932467200 | elapsed time per iteration (s): 0.68 | learning rate: 9.445E-05 | global batch size: 256 | lm loss: 2.509660E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.200 | TFLOPs: 22.82 | 31: iteration 267000/ 476837 | consumed samples: 68352000 | consumed tokens: 139984896000 | elapsed time per iteration (s): 0.68 | learning rate: 9.439E-05 | global batch size: 256 | lm loss: 2.512391E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.148 | TFLOPs: 22.82 | 31: iteration 267100/ 476837 | consumed samples: 68377600 | consumed tokens: 140037324800 | elapsed time per iteration (s): 0.68 | learning rate: 9.433E-05 | global batch size: 256 | lm loss: 2.514284E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.101 | TFLOPs: 22.81 | 31: iteration 267200/ 476837 | consumed samples: 68403200 | consumed tokens: 140089753600 | elapsed time per iteration (s): 0.68 | learning rate: 9.427E-05 | global batch size: 256 | lm loss: 2.512427E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.084 | TFLOPs: 22.81 | 31: iteration 267300/ 476837 | consumed samples: 68428800 | consumed tokens: 140142182400 | elapsed time per iteration (s): 0.68 | learning rate: 9.421E-05 | global batch size: 256 | lm loss: 2.513357E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.607 | TFLOPs: 22.66 | 31: iteration 267400/ 476837 | consumed samples: 68454400 | consumed tokens: 140194611200 | elapsed time per iteration (s): 0.68 | learning rate: 9.415E-05 | global batch size: 256 | lm loss: 2.507175E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.757 | TFLOPs: 22.67 | 31: iteration 267500/ 476837 | consumed samples: 68480000 | consumed tokens: 140247040000 | elapsed time per iteration (s): 0.68 | learning rate: 9.409E-05 | global batch size: 256 | lm loss: 2.513336E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.217 | TFLOPs: 22.76 | 31: iteration 267600/ 476837 | consumed samples: 68505600 | consumed tokens: 140299468800 | elapsed time per iteration (s): 0.68 | learning rate: 9.403E-05 | global batch size: 256 | lm loss: 2.515776E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.190 | TFLOPs: 22.82 | 31: iteration 267700/ 476837 | consumed samples: 68531200 | consumed tokens: 140351897600 | elapsed time per iteration (s): 0.68 | learning rate: 9.398E-05 | global batch size: 256 | lm loss: 2.513403E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.268 | TFLOPs: 22.76 | 31: iteration 267800/ 476837 | consumed samples: 68556800 | consumed tokens: 140404326400 | elapsed time per iteration (s): 0.68 | learning rate: 9.392E-05 | global batch size: 256 | lm loss: 2.510439E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.004 | TFLOPs: 22.81 | 31: iteration 267900/ 476837 | consumed samples: 68582400 | consumed tokens: 140456755200 | elapsed time per iteration (s): 0.68 | learning rate: 9.386E-05 | global batch size: 256 | lm loss: 2.514866E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.218 | TFLOPs: 22.82 | 0: [2023-04-28 00:33:50,797] [INFO] [logging.py:68:log_dist] [Rank 0] step=268000, skipped=0, lr=[9.379914167866744e-05, 9.379914167866744e-05, 9.379914167866744e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 268000/ 476837 | consumed samples: 68608000 | consumed tokens: 140509184000 | elapsed time per iteration (s): 0.68 | learning rate: 9.380E-05 | global batch size: 256 | lm loss: 2.512347E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.270 | TFLOPs: 22.82 | 0: steps: 268000 loss: 2.4920 iter time (s): 0.681 samples/sec: 376.179 31: iteration 268100/ 476837 | consumed samples: 68633600 | consumed tokens: 140561612800 | elapsed time per iteration (s): 0.69 | learning rate: 9.374E-05 | global batch size: 256 | lm loss: 2.508887E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.136 | TFLOPs: 22.45 | 31: iteration 268200/ 476837 | consumed samples: 68659200 | consumed tokens: 140614041600 | elapsed time per iteration (s): 0.68 | learning rate: 9.368E-05 | global batch size: 256 | lm loss: 2.509388E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.183 | TFLOPs: 22.70 | 31: iteration 268300/ 476837 | consumed samples: 68684800 | consumed tokens: 140666470400 | elapsed time per iteration (s): 0.68 | learning rate: 9.362E-05 | global batch size: 256 | lm loss: 2.509342E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.973 | TFLOPs: 22.81 | 31: iteration 268400/ 476837 | consumed samples: 68710400 | consumed tokens: 140718899200 | elapsed time per iteration (s): 0.68 | learning rate: 9.356E-05 | global batch size: 256 | lm loss: 2.513375E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.933 | TFLOPs: 22.80 | 31: iteration 268500/ 476837 | consumed samples: 68736000 | consumed tokens: 140771328000 | elapsed time per iteration (s): 0.68 | learning rate: 9.350E-05 | global batch size: 256 | lm loss: 2.511292E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.905 | TFLOPs: 22.80 | 31: iteration 268600/ 476837 | consumed samples: 68761600 | consumed tokens: 140823756800 | elapsed time per iteration (s): 0.77 | learning rate: 9.345E-05 | global batch size: 256 | lm loss: 2.505742E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 333.536 | TFLOPs: 20.18 | 31: iteration 268700/ 476837 | consumed samples: 68787200 | consumed tokens: 140876185600 | elapsed time per iteration (s): 0.72 | learning rate: 9.339E-05 | global batch size: 256 | lm loss: 2.511831E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.257 | TFLOPs: 21.49 | 31: iteration 268800/ 476837 | consumed samples: 68812800 | consumed tokens: 140928614400 | elapsed time per iteration (s): 0.69 | learning rate: 9.333E-05 | global batch size: 256 | lm loss: 2.510425E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.028 | TFLOPs: 22.57 | 31: iteration 268900/ 476837 | consumed samples: 68838400 | consumed tokens: 140981043200 | elapsed time per iteration (s): 0.68 | learning rate: 9.327E-05 | global batch size: 256 | lm loss: 2.512306E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.076 | TFLOPs: 22.81 | 31: iteration 269000/ 476837 | consumed samples: 68864000 | consumed tokens: 141033472000 | elapsed time per iteration (s): 0.68 | learning rate: 9.321E-05 | global batch size: 256 | lm loss: 2.508514E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.515 | TFLOPs: 22.78 | 31: iteration 269100/ 476837 | consumed samples: 68889600 | consumed tokens: 141085900800 | elapsed time per iteration (s): 0.68 | learning rate: 9.315E-05 | global batch size: 256 | lm loss: 2.511385E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.382 | TFLOPs: 22.77 | 31: iteration 269200/ 476837 | consumed samples: 68915200 | consumed tokens: 141138329600 | elapsed time per iteration (s): 0.68 | learning rate: 9.309E-05 | global batch size: 256 | lm loss: 2.510895E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.822 | TFLOPs: 22.74 | 31: iteration 269300/ 476837 | consumed samples: 68940800 | consumed tokens: 141190758400 | elapsed time per iteration (s): 0.68 | learning rate: 9.303E-05 | global batch size: 256 | lm loss: 2.514480E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.551 | TFLOPs: 22.78 | 31: iteration 269400/ 476837 | consumed samples: 68966400 | consumed tokens: 141243187200 | elapsed time per iteration (s): 0.68 | learning rate: 9.298E-05 | global batch size: 256 | lm loss: 2.503823E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.906 | TFLOPs: 22.80 | 31: iteration 269500/ 476837 | consumed samples: 68992000 | consumed tokens: 141295616000 | elapsed time per iteration (s): 0.68 | learning rate: 9.292E-05 | global batch size: 256 | lm loss: 2.508902E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.829 | TFLOPs: 22.80 | 31: iteration 269600/ 476837 | consumed samples: 69017600 | consumed tokens: 141348044800 | elapsed time per iteration (s): 0.68 | learning rate: 9.286E-05 | global batch size: 256 | lm loss: 2.509559E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.948 | TFLOPs: 22.80 | 31: iteration 269700/ 476837 | consumed samples: 69043200 | consumed tokens: 141400473600 | elapsed time per iteration (s): 0.68 | learning rate: 9.280E-05 | global batch size: 256 | lm loss: 2.511003E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.004 | TFLOPs: 22.75 | 31: iteration 269800/ 476837 | consumed samples: 69068800 | consumed tokens: 141452902400 | elapsed time per iteration (s): 0.69 | learning rate: 9.274E-05 | global batch size: 256 | lm loss: 2.514344E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.064 | TFLOPs: 22.57 | 31: iteration 269900/ 476837 | consumed samples: 69094400 | consumed tokens: 141505331200 | elapsed time per iteration (s): 0.68 | learning rate: 9.268E-05 | global batch size: 256 | lm loss: 2.505381E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.646 | TFLOPs: 22.73 | 0: [2023-04-28 00:56:46,409] [INFO] [logging.py:68:log_dist] [Rank 0] step=270000, skipped=0, lr=[9.262228875190313e-05, 9.262228875190313e-05, 9.262228875190313e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 270000/ 476837 | consumed samples: 69120000 | consumed tokens: 141557760000 | elapsed time per iteration (s): 0.69 | learning rate: 9.262E-05 | global batch size: 256 | lm loss: 2.509387E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.267 | TFLOPs: 22.58 | 0: steps: 270000 loss: 2.4794 iter time (s): 0.685 samples/sec: 373.973 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 270000 | lm loss value: 2.932091E+00 | lm loss PPL: 1.876682E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 270100/ 476837 | consumed samples: 69145600 | consumed tokens: 141610188800 | elapsed time per iteration (s): 0.69 | learning rate: 9.256E-05 | global batch size: 256 | lm loss: 2.509032E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.527 | TFLOPs: 22.48 | 31: iteration 270200/ 476837 | consumed samples: 69171200 | consumed tokens: 141662617600 | elapsed time per iteration (s): 0.68 | learning rate: 9.250E-05 | global batch size: 256 | lm loss: 2.510442E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.244 | TFLOPs: 22.64 | 31: iteration 270300/ 476837 | consumed samples: 69196800 | consumed tokens: 141715046400 | elapsed time per iteration (s): 0.68 | learning rate: 9.245E-05 | global batch size: 256 | lm loss: 2.506475E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.412 | TFLOPs: 22.71 | 31: iteration 270400/ 476837 | consumed samples: 69222400 | consumed tokens: 141767475200 | elapsed time per iteration (s): 0.68 | learning rate: 9.239E-05 | global batch size: 256 | lm loss: 2.506000E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.830 | TFLOPs: 22.80 | 31: iteration 270500/ 476837 | consumed samples: 69248000 | consumed tokens: 141819904000 | elapsed time per iteration (s): 0.68 | learning rate: 9.233E-05 | global batch size: 256 | lm loss: 2.508709E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.919 | TFLOPs: 22.80 | 31: iteration 270600/ 476837 | consumed samples: 69273600 | consumed tokens: 141872332800 | elapsed time per iteration (s): 0.68 | learning rate: 9.227E-05 | global batch size: 256 | lm loss: 2.507919E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.030 | TFLOPs: 22.75 | 31: iteration 270700/ 476837 | consumed samples: 69299200 | consumed tokens: 141924761600 | elapsed time per iteration (s): 0.68 | learning rate: 9.221E-05 | global batch size: 256 | lm loss: 2.511348E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.867 | TFLOPs: 22.80 | 31: iteration 270800/ 476837 | consumed samples: 69324800 | consumed tokens: 141977190400 | elapsed time per iteration (s): 0.68 | learning rate: 9.215E-05 | global batch size: 256 | lm loss: 2.508308E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.389 | TFLOPs: 22.77 | 31: iteration 270900/ 476837 | consumed samples: 69350400 | consumed tokens: 142029619200 | elapsed time per iteration (s): 0.68 | learning rate: 9.209E-05 | global batch size: 256 | lm loss: 2.506908E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.895 | TFLOPs: 22.80 | 31: iteration 271000/ 476837 | consumed samples: 69376000 | consumed tokens: 142082048000 | elapsed time per iteration (s): 0.68 | learning rate: 9.204E-05 | global batch size: 256 | lm loss: 2.509050E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.844 | TFLOPs: 22.80 | 31: iteration 271100/ 476837 | consumed samples: 69401600 | consumed tokens: 142134476800 | elapsed time per iteration (s): 0.68 | learning rate: 9.198E-05 | global batch size: 256 | lm loss: 2.507513E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.992 | TFLOPs: 22.75 | 31: iteration 271200/ 476837 | consumed samples: 69427200 | consumed tokens: 142186905600 | elapsed time per iteration (s): 0.74 | learning rate: 9.192E-05 | global batch size: 256 | lm loss: 2.509993E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 346.703 | TFLOPs: 20.97 | 31: iteration 271300/ 476837 | consumed samples: 69452800 | consumed tokens: 142239334400 | elapsed time per iteration (s): 0.76 | learning rate: 9.186E-05 | global batch size: 256 | lm loss: 2.507505E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 336.966 | TFLOPs: 20.39 | 31: iteration 271400/ 476837 | consumed samples: 69478400 | consumed tokens: 142291763200 | elapsed time per iteration (s): 0.68 | learning rate: 9.180E-05 | global batch size: 256 | lm loss: 2.499793E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.963 | TFLOPs: 22.74 | 31: iteration 271500/ 476837 | consumed samples: 69504000 | consumed tokens: 142344192000 | elapsed time per iteration (s): 0.68 | learning rate: 9.174E-05 | global batch size: 256 | lm loss: 2.510514E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.375 | TFLOPs: 22.77 | 31: iteration 271600/ 476837 | consumed samples: 69529600 | consumed tokens: 142396620800 | elapsed time per iteration (s): 0.68 | learning rate: 9.168E-05 | global batch size: 256 | lm loss: 2.509287E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.708 | TFLOPs: 22.79 | 31: iteration 271700/ 476837 | consumed samples: 69555200 | consumed tokens: 142449049600 | elapsed time per iteration (s): 0.69 | learning rate: 9.162E-05 | global batch size: 256 | lm loss: 2.507915E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.811 | TFLOPs: 22.55 | 31: iteration 271800/ 476837 | consumed samples: 69580800 | consumed tokens: 142501478400 | elapsed time per iteration (s): 0.68 | learning rate: 9.157E-05 | global batch size: 256 | lm loss: 2.510839E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.161 | TFLOPs: 22.76 | 31: iteration 271900/ 476837 | consumed samples: 69606400 | consumed tokens: 142553907200 | elapsed time per iteration (s): 0.68 | learning rate: 9.151E-05 | global batch size: 256 | lm loss: 2.507885E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.641 | TFLOPs: 22.73 | 0: [2023-04-28 01:19:42,407] [INFO] [logging.py:68:log_dist] [Rank 0] step=272000, skipped=0, lr=[9.144851429797729e-05, 9.144851429797729e-05, 9.144851429797729e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 272000/ 476837 | consumed samples: 69632000 | consumed tokens: 142606336000 | elapsed time per iteration (s): 0.68 | learning rate: 9.145E-05 | global batch size: 256 | lm loss: 2.509932E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.870 | TFLOPs: 22.80 | 0: steps: 272000 loss: 2.5029 iter time (s): 0.685 samples/sec: 373.944 31: iteration 272100/ 476837 | consumed samples: 69657600 | consumed tokens: 142658764800 | elapsed time per iteration (s): 0.68 | learning rate: 9.139E-05 | global batch size: 256 | lm loss: 2.509614E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.683 | TFLOPs: 22.73 | 31: iteration 272200/ 476837 | consumed samples: 69683200 | consumed tokens: 142711193600 | elapsed time per iteration (s): 0.68 | learning rate: 9.133E-05 | global batch size: 256 | lm loss: 2.502896E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.243 | TFLOPs: 22.64 | 31: iteration 272300/ 476837 | consumed samples: 69708800 | consumed tokens: 142763622400 | elapsed time per iteration (s): 0.68 | learning rate: 9.127E-05 | global batch size: 256 | lm loss: 2.505025E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.835 | TFLOPs: 22.80 | 31: iteration 272400/ 476837 | consumed samples: 69734400 | consumed tokens: 142816051200 | elapsed time per iteration (s): 0.68 | learning rate: 9.121E-05 | global batch size: 256 | lm loss: 2.502729E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.910 | TFLOPs: 22.80 | 31: iteration 272500/ 476837 | consumed samples: 69760000 | consumed tokens: 142868480000 | elapsed time per iteration (s): 0.68 | learning rate: 9.116E-05 | global batch size: 256 | lm loss: 2.508587E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.047 | TFLOPs: 22.81 | 31: iteration 272600/ 476837 | consumed samples: 69785600 | consumed tokens: 142920908800 | elapsed time per iteration (s): 0.68 | learning rate: 9.110E-05 | global batch size: 256 | lm loss: 2.509317E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.078 | TFLOPs: 22.81 | 31: iteration 272700/ 476837 | consumed samples: 69811200 | consumed tokens: 142973337600 | elapsed time per iteration (s): 0.68 | learning rate: 9.104E-05 | global batch size: 256 | lm loss: 2.506629E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.068 | TFLOPs: 22.81 | 31: iteration 272800/ 476837 | consumed samples: 69836800 | consumed tokens: 143025766400 | elapsed time per iteration (s): 0.68 | learning rate: 9.098E-05 | global batch size: 256 | lm loss: 2.507749E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.573 | TFLOPs: 22.78 | 31: iteration 272900/ 476837 | consumed samples: 69862400 | consumed tokens: 143078195200 | elapsed time per iteration (s): 0.68 | learning rate: 9.092E-05 | global batch size: 256 | lm loss: 2.506616E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.986 | TFLOPs: 22.81 | 31: iteration 273000/ 476837 | consumed samples: 69888000 | consumed tokens: 143130624000 | elapsed time per iteration (s): 0.68 | learning rate: 9.086E-05 | global batch size: 256 | lm loss: 2.505513E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.293 | TFLOPs: 22.70 | 31: iteration 273100/ 476837 | consumed samples: 69913600 | consumed tokens: 143183052800 | elapsed time per iteration (s): 0.68 | learning rate: 9.080E-05 | global batch size: 256 | lm loss: 2.507140E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.030 | TFLOPs: 22.69 | 31: iteration 273200/ 476837 | consumed samples: 69939200 | consumed tokens: 143235481600 | elapsed time per iteration (s): 0.68 | learning rate: 9.075E-05 | global batch size: 256 | lm loss: 2.508987E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.939 | TFLOPs: 22.74 | 31: iteration 273300/ 476837 | consumed samples: 69964800 | consumed tokens: 143287910400 | elapsed time per iteration (s): 0.68 | learning rate: 9.069E-05 | global batch size: 256 | lm loss: 2.507376E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.967 | TFLOPs: 22.81 | 31: iteration 273400/ 476837 | consumed samples: 69990400 | consumed tokens: 143340339200 | elapsed time per iteration (s): 0.68 | learning rate: 9.063E-05 | global batch size: 256 | lm loss: 2.507253E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.058 | TFLOPs: 22.75 | 31: iteration 273500/ 476837 | consumed samples: 70016000 | consumed tokens: 143392768000 | elapsed time per iteration (s): 0.68 | learning rate: 9.057E-05 | global batch size: 256 | lm loss: 2.508705E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.030 | TFLOPs: 22.81 | 31: iteration 273600/ 476837 | consumed samples: 70041600 | consumed tokens: 143445196800 | elapsed time per iteration (s): 0.68 | learning rate: 9.051E-05 | global batch size: 256 | lm loss: 2.506694E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.970 | TFLOPs: 22.81 | 31: iteration 273700/ 476837 | consumed samples: 70067200 | consumed tokens: 143497625600 | elapsed time per iteration (s): 0.72 | learning rate: 9.045E-05 | global batch size: 256 | lm loss: 2.500547E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.019 | TFLOPs: 21.48 | 31: iteration 273800/ 476837 | consumed samples: 70092800 | consumed tokens: 143550054400 | elapsed time per iteration (s): 0.68 | learning rate: 9.039E-05 | global batch size: 256 | lm loss: 2.508763E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.164 | TFLOPs: 22.70 | 31: iteration 273900/ 476837 | consumed samples: 70118400 | consumed tokens: 143602483200 | elapsed time per iteration (s): 0.82 | learning rate: 9.034E-05 | global batch size: 256 | lm loss: 2.504252E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 313.416 | TFLOPs: 18.96 | 0: [2023-04-28 01:42:40,745] [INFO] [logging.py:68:log_dist] [Rank 0] step=274000, skipped=0, lr=[9.027802625179691e-05, 9.027802625179691e-05, 9.027802625179691e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 274000/ 476837 | consumed samples: 70144000 | consumed tokens: 143654912000 | elapsed time per iteration (s): 0.68 | learning rate: 9.028E-05 | global batch size: 256 | lm loss: 2.505091E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.597 | TFLOPs: 22.78 | 0: steps: 274000 loss: 2.4680 iter time (s): 0.686 samples/sec: 373.307 31: iteration 274100/ 476837 | consumed samples: 70169600 | consumed tokens: 143707340800 | elapsed time per iteration (s): 0.68 | learning rate: 9.022E-05 | global batch size: 256 | lm loss: 2.506095E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.944 | TFLOPs: 22.80 | 31: iteration 274200/ 476837 | consumed samples: 70195200 | consumed tokens: 143759769600 | elapsed time per iteration (s): 0.68 | learning rate: 9.016E-05 | global batch size: 256 | lm loss: 2.506573E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.945 | TFLOPs: 22.80 | 31: iteration 274300/ 476837 | consumed samples: 70220800 | consumed tokens: 143812198400 | elapsed time per iteration (s): 0.68 | learning rate: 9.010E-05 | global batch size: 256 | lm loss: 2.505813E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.292 | TFLOPs: 22.64 | 31: iteration 274400/ 476837 | consumed samples: 70246400 | consumed tokens: 143864627200 | elapsed time per iteration (s): 0.68 | learning rate: 9.004E-05 | global batch size: 256 | lm loss: 2.508912E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.880 | TFLOPs: 22.80 | 31: iteration 274500/ 476837 | consumed samples: 70272000 | consumed tokens: 143917056000 | elapsed time per iteration (s): 0.68 | learning rate: 8.999E-05 | global batch size: 256 | lm loss: 2.501017E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.318 | TFLOPs: 22.77 | 31: iteration 274600/ 476837 | consumed samples: 70297600 | consumed tokens: 143969484800 | elapsed time per iteration (s): 0.68 | learning rate: 8.993E-05 | global batch size: 256 | lm loss: 2.502358E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.711 | TFLOPs: 22.79 | 31: iteration 274700/ 476837 | consumed samples: 70323200 | consumed tokens: 144021913600 | elapsed time per iteration (s): 0.68 | learning rate: 8.987E-05 | global batch size: 256 | lm loss: 2.503439E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.357 | TFLOPs: 22.77 | 31: iteration 274800/ 476837 | consumed samples: 70348800 | consumed tokens: 144074342400 | elapsed time per iteration (s): 0.68 | learning rate: 8.981E-05 | global batch size: 256 | lm loss: 2.505524E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.010 | TFLOPs: 22.81 | 31: iteration 274900/ 476837 | consumed samples: 70374400 | consumed tokens: 144126771200 | elapsed time per iteration (s): 0.68 | learning rate: 8.975E-05 | global batch size: 256 | lm loss: 2.506297E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.839 | TFLOPs: 22.74 | 31: iteration 275000/ 476837 | consumed samples: 70400000 | consumed tokens: 144179200000 | elapsed time per iteration (s): 0.68 | learning rate: 8.969E-05 | global batch size: 256 | lm loss: 2.506507E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.797 | TFLOPs: 22.73 | 31: iteration 275100/ 476837 | consumed samples: 70425600 | consumed tokens: 144231628800 | elapsed time per iteration (s): 0.68 | learning rate: 8.964E-05 | global batch size: 256 | lm loss: 2.505219E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.917 | TFLOPs: 22.80 | 31: iteration 275200/ 476837 | consumed samples: 70451200 | consumed tokens: 144284057600 | elapsed time per iteration (s): 0.68 | learning rate: 8.958E-05 | global batch size: 256 | lm loss: 2.501701E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.823 | TFLOPs: 22.80 | 31: iteration 275300/ 476837 | consumed samples: 70476800 | consumed tokens: 144336486400 | elapsed time per iteration (s): 0.68 | learning rate: 8.952E-05 | global batch size: 256 | lm loss: 2.511637E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.081 | TFLOPs: 22.75 | 31: iteration 275400/ 476837 | consumed samples: 70502400 | consumed tokens: 144388915200 | elapsed time per iteration (s): 0.68 | learning rate: 8.946E-05 | global batch size: 256 | lm loss: 2.507206E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.452 | TFLOPs: 22.77 | 31: iteration 275500/ 476837 | consumed samples: 70528000 | consumed tokens: 144441344000 | elapsed time per iteration (s): 0.68 | learning rate: 8.940E-05 | global batch size: 256 | lm loss: 2.506143E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.793 | TFLOPs: 22.73 | 31: iteration 275600/ 476837 | consumed samples: 70553600 | consumed tokens: 144493772800 | elapsed time per iteration (s): 0.68 | learning rate: 8.934E-05 | global batch size: 256 | lm loss: 2.498410E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.270 | TFLOPs: 22.70 | 31: iteration 275700/ 476837 | consumed samples: 70579200 | consumed tokens: 144546201600 | elapsed time per iteration (s): 0.68 | learning rate: 8.929E-05 | global batch size: 256 | lm loss: 2.504774E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.139 | TFLOPs: 22.63 | 31: iteration 275800/ 476837 | consumed samples: 70604800 | consumed tokens: 144598630400 | elapsed time per iteration (s): 0.69 | learning rate: 8.923E-05 | global batch size: 256 | lm loss: 2.501172E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.221 | TFLOPs: 22.58 | 31: iteration 275900/ 476837 | consumed samples: 70630400 | consumed tokens: 144651059200 | elapsed time per iteration (s): 0.68 | learning rate: 8.917E-05 | global batch size: 256 | lm loss: 2.503115E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.459 | TFLOPs: 22.71 | 0: [2023-04-28 02:05:22,645] [INFO] [logging.py:68:log_dist] [Rank 0] step=276000, skipped=0, lr=[8.911103196607998e-05, 8.911103196607998e-05, 8.911103196607998e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 276000/ 476837 | consumed samples: 70656000 | consumed tokens: 144703488000 | elapsed time per iteration (s): 0.68 | learning rate: 8.911E-05 | global batch size: 256 | lm loss: 2.503765E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.723 | TFLOPs: 22.73 | 0: steps: 276000 loss: 2.4930 iter time (s): 0.677 samples/sec: 377.935 31: iteration 276100/ 476837 | consumed samples: 70681600 | consumed tokens: 144755916800 | elapsed time per iteration (s): 0.68 | learning rate: 8.905E-05 | global batch size: 256 | lm loss: 2.504508E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.885 | TFLOPs: 22.80 | 31: iteration 276200/ 476837 | consumed samples: 70707200 | consumed tokens: 144808345600 | elapsed time per iteration (s): 0.68 | learning rate: 8.899E-05 | global batch size: 256 | lm loss: 2.505710E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.992 | TFLOPs: 22.81 | 31: iteration 276300/ 476837 | consumed samples: 70732800 | consumed tokens: 144860774400 | elapsed time per iteration (s): 0.68 | learning rate: 8.894E-05 | global batch size: 256 | lm loss: 2.500687E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.075 | TFLOPs: 22.75 | 31: iteration 276400/ 476837 | consumed samples: 70758400 | consumed tokens: 144913203200 | elapsed time per iteration (s): 0.68 | learning rate: 8.888E-05 | global batch size: 256 | lm loss: 2.503944E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.536 | TFLOPs: 22.78 | 31: iteration 276500/ 476837 | consumed samples: 70784000 | consumed tokens: 144965632000 | elapsed time per iteration (s): 0.72 | learning rate: 8.882E-05 | global batch size: 256 | lm loss: 2.502587E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.689 | TFLOPs: 21.52 | 31: iteration 276600/ 476837 | consumed samples: 70809600 | consumed tokens: 145018060800 | elapsed time per iteration (s): 0.78 | learning rate: 8.876E-05 | global batch size: 256 | lm loss: 2.506670E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 327.533 | TFLOPs: 19.81 | 31: iteration 276700/ 476837 | consumed samples: 70835200 | consumed tokens: 145070489600 | elapsed time per iteration (s): 0.68 | learning rate: 8.870E-05 | global batch size: 256 | lm loss: 2.507995E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.013 | TFLOPs: 22.81 | 31: iteration 276800/ 476837 | consumed samples: 70860800 | consumed tokens: 145122918400 | elapsed time per iteration (s): 0.68 | learning rate: 8.865E-05 | global batch size: 256 | lm loss: 2.501692E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.278 | TFLOPs: 22.76 | 31: iteration 276900/ 476837 | consumed samples: 70886400 | consumed tokens: 145175347200 | elapsed time per iteration (s): 0.68 | learning rate: 8.859E-05 | global batch size: 256 | lm loss: 2.503569E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.613 | TFLOPs: 22.72 | 31: iteration 277000/ 476837 | consumed samples: 70912000 | consumed tokens: 145227776000 | elapsed time per iteration (s): 0.68 | learning rate: 8.853E-05 | global batch size: 256 | lm loss: 2.503048E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.076 | TFLOPs: 22.75 | 31: iteration 277100/ 476837 | consumed samples: 70937600 | consumed tokens: 145280204800 | elapsed time per iteration (s): 0.68 | learning rate: 8.847E-05 | global batch size: 256 | lm loss: 2.499630E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.388 | TFLOPs: 22.77 | 31: iteration 277200/ 476837 | consumed samples: 70963200 | consumed tokens: 145332633600 | elapsed time per iteration (s): 0.68 | learning rate: 8.841E-05 | global batch size: 256 | lm loss: 2.505873E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.899 | TFLOPs: 22.80 | 31: iteration 277300/ 476837 | consumed samples: 70988800 | consumed tokens: 145385062400 | elapsed time per iteration (s): 0.68 | learning rate: 8.835E-05 | global batch size: 256 | lm loss: 2.508539E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.025 | TFLOPs: 22.81 | 31: iteration 277400/ 476837 | consumed samples: 71014400 | consumed tokens: 145437491200 | elapsed time per iteration (s): 0.68 | learning rate: 8.830E-05 | global batch size: 256 | lm loss: 2.505870E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.073 | TFLOPs: 22.63 | 31: iteration 277500/ 476837 | consumed samples: 71040000 | consumed tokens: 145489920000 | elapsed time per iteration (s): 0.68 | learning rate: 8.824E-05 | global batch size: 256 | lm loss: 2.504211E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.120 | TFLOPs: 22.75 | 31: iteration 277600/ 476837 | consumed samples: 71065600 | consumed tokens: 145542348800 | elapsed time per iteration (s): 0.68 | learning rate: 8.818E-05 | global batch size: 256 | lm loss: 2.502389E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.558 | TFLOPs: 22.72 | 31: iteration 277700/ 476837 | consumed samples: 71091200 | consumed tokens: 145594777600 | elapsed time per iteration (s): 0.68 | learning rate: 8.812E-05 | global batch size: 256 | lm loss: 2.498665E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.983 | TFLOPs: 22.81 | 31: iteration 277800/ 476837 | consumed samples: 71116800 | consumed tokens: 145647206400 | elapsed time per iteration (s): 0.68 | learning rate: 8.806E-05 | global batch size: 256 | lm loss: 2.499185E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.826 | TFLOPs: 22.74 | 31: iteration 277900/ 476837 | consumed samples: 71142400 | consumed tokens: 145699635200 | elapsed time per iteration (s): 0.68 | learning rate: 8.801E-05 | global batch size: 256 | lm loss: 2.501189E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.288 | TFLOPs: 22.64 | 0: [2023-04-28 02:28:17,748] [INFO] [logging.py:68:log_dist] [Rank 0] step=278000, skipped=0, lr=[8.79477381746224e-05, 8.79477381746224e-05, 8.79477381746224e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 278000/ 476837 | consumed samples: 71168000 | consumed tokens: 145752064000 | elapsed time per iteration (s): 0.68 | learning rate: 8.795E-05 | global batch size: 256 | lm loss: 2.498287E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.511 | TFLOPs: 22.78 | 0: steps: 278000 loss: 2.4697 iter time (s): 0.684 samples/sec: 374.006 31: iteration 278100/ 476837 | consumed samples: 71193600 | consumed tokens: 145804492800 | elapsed time per iteration (s): 0.68 | learning rate: 8.789E-05 | global batch size: 256 | lm loss: 2.499079E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.400 | TFLOPs: 22.77 | 31: iteration 278200/ 476837 | consumed samples: 71219200 | consumed tokens: 145856921600 | elapsed time per iteration (s): 0.68 | learning rate: 8.783E-05 | global batch size: 256 | lm loss: 2.501371E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.845 | TFLOPs: 22.80 | 31: iteration 278300/ 476837 | consumed samples: 71244800 | consumed tokens: 145909350400 | elapsed time per iteration (s): 0.68 | learning rate: 8.777E-05 | global batch size: 256 | lm loss: 2.499240E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.006 | TFLOPs: 22.81 | 31: iteration 278400/ 476837 | consumed samples: 71270400 | consumed tokens: 145961779200 | elapsed time per iteration (s): 0.68 | learning rate: 8.772E-05 | global batch size: 256 | lm loss: 2.500217E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.884 | TFLOPs: 22.80 | 31: iteration 278500/ 476837 | consumed samples: 71296000 | consumed tokens: 146014208000 | elapsed time per iteration (s): 0.68 | learning rate: 8.766E-05 | global batch size: 256 | lm loss: 2.501840E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.866 | TFLOPs: 22.80 | 31: iteration 278600/ 476837 | consumed samples: 71321600 | consumed tokens: 146066636800 | elapsed time per iteration (s): 0.68 | learning rate: 8.760E-05 | global batch size: 256 | lm loss: 2.502780E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.952 | TFLOPs: 22.80 | 31: iteration 278700/ 476837 | consumed samples: 71347200 | consumed tokens: 146119065600 | elapsed time per iteration (s): 0.68 | learning rate: 8.754E-05 | global batch size: 256 | lm loss: 2.504872E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.780 | TFLOPs: 22.61 | 31: iteration 278800/ 476837 | consumed samples: 71372800 | consumed tokens: 146171494400 | elapsed time per iteration (s): 0.68 | learning rate: 8.748E-05 | global batch size: 256 | lm loss: 2.498167E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.328 | TFLOPs: 22.65 | 31: iteration 278900/ 476837 | consumed samples: 71398400 | consumed tokens: 146223923200 | elapsed time per iteration (s): 0.68 | learning rate: 8.743E-05 | global batch size: 256 | lm loss: 2.500461E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.353 | TFLOPs: 22.65 | 31: iteration 279000/ 476837 | consumed samples: 71424000 | consumed tokens: 146276352000 | elapsed time per iteration (s): 0.68 | learning rate: 8.737E-05 | global batch size: 256 | lm loss: 2.500417E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.875 | TFLOPs: 22.80 | 31: iteration 279100/ 476837 | consumed samples: 71449600 | consumed tokens: 146328780800 | elapsed time per iteration (s): 0.68 | learning rate: 8.731E-05 | global batch size: 256 | lm loss: 2.495574E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.169 | TFLOPs: 22.76 | 31: iteration 279200/ 476837 | consumed samples: 71475200 | consumed tokens: 146381209600 | elapsed time per iteration (s): 0.73 | learning rate: 8.725E-05 | global batch size: 256 | lm loss: 2.502534E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.199 | TFLOPs: 21.19 | 31: iteration 279300/ 476837 | consumed samples: 71500800 | consumed tokens: 146433638400 | elapsed time per iteration (s): 0.78 | learning rate: 8.719E-05 | global batch size: 256 | lm loss: 2.496943E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 330.290 | TFLOPs: 19.98 | 31: iteration 279400/ 476837 | consumed samples: 71526400 | consumed tokens: 146486067200 | elapsed time per iteration (s): 0.68 | learning rate: 8.714E-05 | global batch size: 256 | lm loss: 2.502099E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.798 | TFLOPs: 22.80 | 31: iteration 279500/ 476837 | consumed samples: 71552000 | consumed tokens: 146538496000 | elapsed time per iteration (s): 0.68 | learning rate: 8.708E-05 | global batch size: 256 | lm loss: 2.499186E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.650 | TFLOPs: 22.73 | 31: iteration 279600/ 476837 | consumed samples: 71577600 | consumed tokens: 146590924800 | elapsed time per iteration (s): 0.68 | learning rate: 8.702E-05 | global batch size: 256 | lm loss: 2.496123E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.739 | TFLOPs: 22.73 | 31: iteration 279700/ 476837 | consumed samples: 71603200 | consumed tokens: 146643353600 | elapsed time per iteration (s): 0.68 | learning rate: 8.696E-05 | global batch size: 256 | lm loss: 2.499133E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.720 | TFLOPs: 22.79 | 31: iteration 279800/ 476837 | consumed samples: 71628800 | consumed tokens: 146695782400 | elapsed time per iteration (s): 0.68 | learning rate: 8.690E-05 | global batch size: 256 | lm loss: 2.500486E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.647 | TFLOPs: 22.79 | 31: iteration 279900/ 476837 | consumed samples: 71654400 | consumed tokens: 146748211200 | elapsed time per iteration (s): 0.68 | learning rate: 8.685E-05 | global batch size: 256 | lm loss: 2.498786E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.690 | TFLOPs: 22.79 | 0: [2023-04-28 02:51:13,432] [INFO] [logging.py:68:log_dist] [Rank 0] step=280000, skipped=0, lr=[8.678835095567519e-05, 8.678835095567519e-05, 8.678835095567519e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 280000/ 476837 | consumed samples: 71680000 | consumed tokens: 146800640000 | elapsed time per iteration (s): 0.68 | learning rate: 8.679E-05 | global batch size: 256 | lm loss: 2.498541E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.869 | TFLOPs: 22.74 | 0: steps: 280000 loss: 2.4788 iter time (s): 0.684 samples/sec: 374.011 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 280000 | lm loss value: 2.920857E+00 | lm loss PPL: 1.855718E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 280000 to checkpoints_1b1250b1b5 0: [2023-04-28 02:51:13,718] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step280000 is begin to save! 0: [2023-04-28 02:51:13,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_01-model_00-model_states.pt... 0: [2023-04-28 02:51:13,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_01-model_00-model_states.pt. 0: [2023-04-28 02:51:13,970] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_03-model_00-model_states.pt... 0: [2023-04-28 02:51:14,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_03-model_00-model_states.pt. 0: [2023-04-28 02:51:14,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_04-model_00-model_states.pt... 0: [2023-04-28 02:51:14,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_04-model_00-model_states.pt. 0: [2023-04-28 02:51:14,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_05-model_00-model_states.pt... 0: [2023-04-28 02:51:14,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_05-model_00-model_states.pt. 0: [2023-04-28 02:51:14,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_06-model_00-model_states.pt... 0: [2023-04-28 02:51:14,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_06-model_00-model_states.pt. 0: [2023-04-28 02:51:14,309] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_07-model_00-model_states.pt... 0: [2023-04-28 02:51:14,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_07-model_00-model_states.pt. 0: [2023-04-28 02:51:14,398] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_08-model_00-model_states.pt... 0: [2023-04-28 02:51:14,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_08-model_00-model_states.pt. 0: [2023-04-28 02:51:14,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_09-model_00-model_states.pt... 0: [2023-04-28 02:51:14,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_09-model_00-model_states.pt. 0: [2023-04-28 02:51:14,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_10-model_00-model_states.pt... 0: [2023-04-28 02:51:14,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_10-model_00-model_states.pt. 0: [2023-04-28 02:51:14,673] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_11-model_00-model_states.pt... 0: [2023-04-28 02:51:14,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_11-model_00-model_states.pt. 0: [2023-04-28 02:51:14,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_12-model_00-model_states.pt... 0: [2023-04-28 02:51:14,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_12-model_00-model_states.pt. 0: [2023-04-28 02:51:14,856] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_13-model_00-model_states.pt... 0: [2023-04-28 02:51:14,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_13-model_00-model_states.pt. 0: [2023-04-28 02:51:14,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_14-model_00-model_states.pt... 0: [2023-04-28 02:51:15,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_14-model_00-model_states.pt. 0: [2023-04-28 02:51:15,037] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_15-model_00-model_states.pt... 0: [2023-04-28 02:51:15,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_15-model_00-model_states.pt. 0: [2023-04-28 02:51:15,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_16-model_00-model_states.pt... 0: [2023-04-28 02:51:15,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_16-model_00-model_states.pt. 0: [2023-04-28 02:51:15,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_17-model_00-model_states.pt... 0: [2023-04-28 02:51:15,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_17-model_00-model_states.pt. 0: [2023-04-28 02:51:15,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_18-model_00-model_states.pt... 0: [2023-04-28 02:51:15,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_18-model_00-model_states.pt. 0: [2023-04-28 02:51:15,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_19-model_00-model_states.pt... 0: [2023-04-28 02:51:15,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_19-model_00-model_states.pt. 0: [2023-04-28 02:51:15,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_20-model_00-model_states.pt... 0: [2023-04-28 02:51:15,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_20-model_00-model_states.pt. 0: [2023-04-28 02:51:15,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_21-model_00-model_states.pt... 0: [2023-04-28 02:51:15,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_21-model_00-model_states.pt. 0: [2023-04-28 02:51:15,665] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_22-model_00-model_states.pt... 0: [2023-04-28 02:51:15,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_22-model_00-model_states.pt. 0: [2023-04-28 02:51:15,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_23-model_00-model_states.pt... 0: [2023-04-28 02:51:15,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_23-model_00-model_states.pt. 0: [2023-04-28 02:51:15,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_24-model_00-model_states.pt... 0: [2023-04-28 02:51:15,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_24-model_00-model_states.pt. 0: [2023-04-28 02:51:15,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_25-model_00-model_states.pt... 0: [2023-04-28 02:51:16,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_25-model_00-model_states.pt. 0: [2023-04-28 02:51:16,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_26-model_00-model_states.pt... 0: [2023-04-28 02:51:16,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_26-model_00-model_states.pt. 0: [2023-04-28 02:51:16,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_27-model_00-model_states.pt... 0: [2023-04-28 02:51:16,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_27-model_00-model_states.pt. 0: [2023-04-28 02:51:16,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_28-model_00-model_states.pt... 0: [2023-04-28 02:51:16,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_28-model_00-model_states.pt. 0: [2023-04-28 02:51:16,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/layer_30-model_00-model_states.pt... 0: [2023-04-28 02:51:16,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/layer_30-model_00-model_states.pt. 0: [2023-04-28 02:51:16,281] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step280000/mp_rank_00_model_states.pt 0: [2023-04-28 02:51:16,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/mp_rank_00_model_states.pt... 0: [2023-04-28 02:51:16,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/mp_rank_00_model_states.pt. 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 11: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 15: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 24: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 16: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 19: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 27: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 29: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 22: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 8: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 10: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 9: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 13: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 18: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 25: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 26: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 31: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 20: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 17: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 6: [2023-04-28 02:51:16,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 3: [2023-04-28 02:51:16,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 13: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 10: [2023-04-28 02:51:16,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 8: [2023-04-28 02:51:16,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-28 02:51:16,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 1: [2023-04-28 02:51:16,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-28 02:51:16,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-28 02:51:16,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 13: [2023-04-28 02:51:16,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 13: [2023-04-28 02:51:16,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 13: [2023-04-28 02:51:16,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 19: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-28 02:51:16,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-28 02:51:16,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-28 02:51:16,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 3: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 12: [2023-04-28 02:51:16,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-28 02:51:16,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-28 02:51:16,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-28 02:51:16,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 22: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 29: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-28 02:51:16,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-28 02:51:16,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 4: [2023-04-28 02:51:16,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 4: [2023-04-28 02:51:16,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-28 02:51:16,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 17: [2023-04-28 02:51:16,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 27: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 27: [2023-04-28 02:51:16,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 28: [2023-04-28 02:51:16,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-28 02:51:16,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-28 02:51:16,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 10: [2023-04-28 02:51:16,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 9: [2023-04-28 02:51:16,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 8: [2023-04-28 02:51:16,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-28 02:51:16,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 13: [2023-04-28 02:51:16,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-28 02:51:16,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-28 02:51:16,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 23: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 11: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 15: [2023-04-28 02:51:16,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-28 02:51:16,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-28 02:51:16,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 10: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 10: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 30: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 10: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 13: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 5: [2023-04-28 02:51:16,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-28 02:51:16,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-28 02:51:16,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 10: [2023-04-28 02:51:16,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-28 02:51:16,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-28 02:51:16,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: [2023-04-28 02:51:16,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-28 02:51:16,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-28 02:51:16,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 14: [2023-04-28 02:51:16,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-28 02:51:16,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-28 02:51:16,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-28 02:51:16,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 4: [2023-04-28 02:51:16,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-28 02:51:16,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-28 02:51:16,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 24: [2023-04-28 02:51:16,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 7: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 20: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 26: [2023-04-28 02:51:16,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-28 02:51:16,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 6: [2023-04-28 02:51:16,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-28 02:51:16,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 21: [2023-04-28 02:51:16,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-28 02:51:16,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-28 02:51:16,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 25: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 2: [2023-04-28 02:51:16,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 0: successfully saved checkpoint at iteration 280000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 2919.61 16: [2023-04-28 02:51:16,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 16: [2023-04-28 02:51:16,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-28 02:51:16,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step280000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-28 02:51:16,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step280000 is ready now! 31: iteration 280100/ 476837 | consumed samples: 71705600 | consumed tokens: 146853068800 | elapsed time per iteration (s): 0.71 | learning rate: 8.673E-05 | global batch size: 256 | lm loss: 2.502853E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.163 | TFLOPs: 21.79 | 31: iteration 280200/ 476837 | consumed samples: 71731200 | consumed tokens: 146905497600 | elapsed time per iteration (s): 0.68 | learning rate: 8.667E-05 | global batch size: 256 | lm loss: 2.502146E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.118 | TFLOPs: 22.75 | 31: iteration 280300/ 476837 | consumed samples: 71756800 | consumed tokens: 146957926400 | elapsed time per iteration (s): 0.68 | learning rate: 8.661E-05 | global batch size: 256 | lm loss: 2.499783E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.805 | TFLOPs: 22.80 | 31: iteration 280400/ 476837 | consumed samples: 71782400 | consumed tokens: 147010355200 | elapsed time per iteration (s): 0.68 | learning rate: 8.656E-05 | global batch size: 256 | lm loss: 2.500949E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.970 | TFLOPs: 22.75 | 31: iteration 280500/ 476837 | consumed samples: 71808000 | consumed tokens: 147062784000 | elapsed time per iteration (s): 0.68 | learning rate: 8.650E-05 | global batch size: 256 | lm loss: 2.497564E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.326 | TFLOPs: 22.77 | 31: iteration 280600/ 476837 | consumed samples: 71833600 | consumed tokens: 147115212800 | elapsed time per iteration (s): 0.68 | learning rate: 8.644E-05 | global batch size: 256 | lm loss: 2.496399E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.248 | TFLOPs: 22.76 | 31: iteration 280700/ 476837 | consumed samples: 71859200 | consumed tokens: 147167641600 | elapsed time per iteration (s): 0.68 | learning rate: 8.638E-05 | global batch size: 256 | lm loss: 2.494360E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.243 | TFLOPs: 22.76 | 31: iteration 280800/ 476837 | consumed samples: 71884800 | consumed tokens: 147220070400 | elapsed time per iteration (s): 0.68 | learning rate: 8.633E-05 | global batch size: 256 | lm loss: 2.497330E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.635 | TFLOPs: 22.79 | 31: iteration 280900/ 476837 | consumed samples: 71910400 | consumed tokens: 147272499200 | elapsed time per iteration (s): 0.68 | learning rate: 8.627E-05 | global batch size: 256 | lm loss: 2.499149E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.810 | TFLOPs: 22.80 | 31: iteration 281000/ 476837 | consumed samples: 71936000 | consumed tokens: 147324928000 | elapsed time per iteration (s): 0.68 | learning rate: 8.621E-05 | global batch size: 256 | lm loss: 2.499719E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.328 | TFLOPs: 22.77 | 31: iteration 281100/ 476837 | consumed samples: 71961600 | consumed tokens: 147377356800 | elapsed time per iteration (s): 0.68 | learning rate: 8.615E-05 | global batch size: 256 | lm loss: 2.501554E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.393 | TFLOPs: 22.77 | 31: iteration 281200/ 476837 | consumed samples: 71987200 | consumed tokens: 147429785600 | elapsed time per iteration (s): 0.68 | learning rate: 8.609E-05 | global batch size: 256 | lm loss: 2.497192E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.949 | TFLOPs: 22.74 | 31: iteration 281300/ 476837 | consumed samples: 72012800 | consumed tokens: 147482214400 | elapsed time per iteration (s): 0.68 | learning rate: 8.604E-05 | global batch size: 256 | lm loss: 2.497340E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.852 | TFLOPs: 22.68 | 31: iteration 281400/ 476837 | consumed samples: 72038400 | consumed tokens: 147534643200 | elapsed time per iteration (s): 0.68 | learning rate: 8.598E-05 | global batch size: 256 | lm loss: 2.499946E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.994 | TFLOPs: 22.69 | 31: iteration 281500/ 476837 | consumed samples: 72064000 | consumed tokens: 147587072000 | elapsed time per iteration (s): 0.68 | learning rate: 8.592E-05 | global batch size: 256 | lm loss: 2.499827E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.171 | TFLOPs: 22.70 | 31: iteration 281600/ 476837 | consumed samples: 72089600 | consumed tokens: 147639500800 | elapsed time per iteration (s): 0.68 | learning rate: 8.586E-05 | global batch size: 256 | lm loss: 2.505068E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.096 | TFLOPs: 22.69 | 31: iteration 281700/ 476837 | consumed samples: 72115200 | consumed tokens: 147691929600 | elapsed time per iteration (s): 0.68 | learning rate: 8.581E-05 | global batch size: 256 | lm loss: 2.500328E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.998 | TFLOPs: 22.75 | 31: iteration 281800/ 476837 | consumed samples: 72140800 | consumed tokens: 147744358400 | elapsed time per iteration (s): 0.68 | learning rate: 8.575E-05 | global batch size: 256 | lm loss: 2.497080E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.101 | TFLOPs: 22.81 | 31: iteration 281900/ 476837 | consumed samples: 72166400 | consumed tokens: 147796787200 | elapsed time per iteration (s): 0.69 | learning rate: 8.569E-05 | global batch size: 256 | lm loss: 2.499370E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.306 | TFLOPs: 22.58 | 0: [2023-04-28 03:14:12,485] [INFO] [logging.py:68:log_dist] [Rank 0] step=282000, skipped=0, lr=[8.563307569543742e-05, 8.563307569543742e-05, 8.563307569543742e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 282000/ 476837 | consumed samples: 72192000 | consumed tokens: 147849216000 | elapsed time per iteration (s): 0.82 | learning rate: 8.563E-05 | global batch size: 256 | lm loss: 2.498033E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 311.687 | TFLOPs: 18.86 | 0: steps: 282000 loss: 2.5076 iter time (s): 0.685 samples/sec: 373.459 31: iteration 282100/ 476837 | consumed samples: 72217600 | consumed tokens: 147901644800 | elapsed time per iteration (s): 0.68 | learning rate: 8.558E-05 | global batch size: 256 | lm loss: 2.494568E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.282 | TFLOPs: 22.76 | 31: iteration 282200/ 476837 | consumed samples: 72243200 | consumed tokens: 147954073600 | elapsed time per iteration (s): 0.68 | learning rate: 8.552E-05 | global batch size: 256 | lm loss: 2.500844E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.039 | TFLOPs: 22.75 | 31: iteration 282300/ 476837 | consumed samples: 72268800 | consumed tokens: 148006502400 | elapsed time per iteration (s): 0.68 | learning rate: 8.546E-05 | global batch size: 256 | lm loss: 2.498199E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.977 | TFLOPs: 22.81 | 31: iteration 282400/ 476837 | consumed samples: 72294400 | consumed tokens: 148058931200 | elapsed time per iteration (s): 0.68 | learning rate: 8.540E-05 | global batch size: 256 | lm loss: 2.496581E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.024 | TFLOPs: 22.81 | 31: iteration 282500/ 476837 | consumed samples: 72320000 | consumed tokens: 148111360000 | elapsed time per iteration (s): 0.68 | learning rate: 8.534E-05 | global batch size: 256 | lm loss: 2.497461E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.237 | TFLOPs: 22.76 | 31: iteration 282600/ 476837 | consumed samples: 72345600 | consumed tokens: 148163788800 | elapsed time per iteration (s): 0.68 | learning rate: 8.529E-05 | global batch size: 256 | lm loss: 2.496820E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.270 | TFLOPs: 22.76 | 31: iteration 282700/ 476837 | consumed samples: 72371200 | consumed tokens: 148216217600 | elapsed time per iteration (s): 0.68 | learning rate: 8.523E-05 | global batch size: 256 | lm loss: 2.504835E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.966 | TFLOPs: 22.75 | 31: iteration 282800/ 476837 | consumed samples: 72396800 | consumed tokens: 148268646400 | elapsed time per iteration (s): 0.68 | learning rate: 8.517E-05 | global batch size: 256 | lm loss: 2.502390E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.153 | TFLOPs: 22.76 | 31: iteration 282900/ 476837 | consumed samples: 72422400 | consumed tokens: 148321075200 | elapsed time per iteration (s): 0.68 | learning rate: 8.511E-05 | global batch size: 256 | lm loss: 2.497324E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.223 | TFLOPs: 22.76 | 31: iteration 283000/ 476837 | consumed samples: 72448000 | consumed tokens: 148373504000 | elapsed time per iteration (s): 0.68 | learning rate: 8.506E-05 | global batch size: 256 | lm loss: 2.495854E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.757 | TFLOPs: 22.79 | 31: iteration 283100/ 476837 | consumed samples: 72473600 | consumed tokens: 148425932800 | elapsed time per iteration (s): 0.68 | learning rate: 8.500E-05 | global batch size: 256 | lm loss: 2.498688E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.149 | TFLOPs: 22.70 | 31: iteration 283200/ 476837 | consumed samples: 72499200 | consumed tokens: 148478361600 | elapsed time per iteration (s): 0.68 | learning rate: 8.494E-05 | global batch size: 256 | lm loss: 2.494842E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.009 | TFLOPs: 22.75 | 31: iteration 283300/ 476837 | consumed samples: 72524800 | consumed tokens: 148530790400 | elapsed time per iteration (s): 0.68 | learning rate: 8.488E-05 | global batch size: 256 | lm loss: 2.494659E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.972 | TFLOPs: 22.75 | 31: iteration 283400/ 476837 | consumed samples: 72550400 | consumed tokens: 148583219200 | elapsed time per iteration (s): 0.68 | learning rate: 8.483E-05 | global batch size: 256 | lm loss: 2.499843E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.023 | TFLOPs: 22.81 | 31: iteration 283500/ 476837 | consumed samples: 72576000 | consumed tokens: 148635648000 | elapsed time per iteration (s): 0.68 | learning rate: 8.477E-05 | global batch size: 256 | lm loss: 2.497838E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.005 | TFLOPs: 22.75 | 31: iteration 283600/ 476837 | consumed samples: 72601600 | consumed tokens: 148688076800 | elapsed time per iteration (s): 0.68 | learning rate: 8.471E-05 | global batch size: 256 | lm loss: 2.495015E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.841 | TFLOPs: 22.68 | 31: iteration 283700/ 476837 | consumed samples: 72627200 | consumed tokens: 148740505600 | elapsed time per iteration (s): 0.68 | learning rate: 8.465E-05 | global batch size: 256 | lm loss: 2.498327E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.547 | TFLOPs: 22.78 | 31: iteration 283800/ 476837 | consumed samples: 72652800 | consumed tokens: 148792934400 | elapsed time per iteration (s): 0.68 | learning rate: 8.460E-05 | global batch size: 256 | lm loss: 2.495377E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.561 | TFLOPs: 22.78 | 31: iteration 283900/ 476837 | consumed samples: 72678400 | consumed tokens: 148845363200 | elapsed time per iteration (s): 0.68 | learning rate: 8.454E-05 | global batch size: 256 | lm loss: 2.495714E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.676 | TFLOPs: 22.79 | 0: [2023-04-28 03:36:53,232] [INFO] [logging.py:68:log_dist] [Rank 0] step=284000, skipped=0, lr=[8.448211705167207e-05, 8.448211705167207e-05, 8.448211705167207e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 284000/ 476837 | consumed samples: 72704000 | consumed tokens: 148897792000 | elapsed time per iteration (s): 0.68 | learning rate: 8.448E-05 | global batch size: 256 | lm loss: 2.497855E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.576 | TFLOPs: 22.78 | 0: steps: 284000 loss: 2.4772 iter time (s): 0.678 samples/sec: 377.645 31: iteration 284100/ 476837 | consumed samples: 72729600 | consumed tokens: 148950220800 | elapsed time per iteration (s): 0.68 | learning rate: 8.442E-05 | global batch size: 256 | lm loss: 2.491952E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.254 | TFLOPs: 22.76 | 31: iteration 284200/ 476837 | consumed samples: 72755200 | consumed tokens: 149002649600 | elapsed time per iteration (s): 0.68 | learning rate: 8.437E-05 | global batch size: 256 | lm loss: 2.493465E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.624 | TFLOPs: 22.78 | 31: iteration 284300/ 476837 | consumed samples: 72780800 | consumed tokens: 149055078400 | elapsed time per iteration (s): 0.68 | learning rate: 8.431E-05 | global batch size: 256 | lm loss: 2.499476E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.701 | TFLOPs: 22.79 | 31: iteration 284400/ 476837 | consumed samples: 72806400 | consumed tokens: 149107507200 | elapsed time per iteration (s): 0.69 | learning rate: 8.425E-05 | global batch size: 256 | lm loss: 2.492951E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.566 | TFLOPs: 22.54 | 31: iteration 284500/ 476837 | consumed samples: 72832000 | consumed tokens: 149159936000 | elapsed time per iteration (s): 0.68 | learning rate: 8.420E-05 | global batch size: 256 | lm loss: 2.494545E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.809 | TFLOPs: 22.74 | 31: iteration 284600/ 476837 | consumed samples: 72857600 | consumed tokens: 149212364800 | elapsed time per iteration (s): 0.68 | learning rate: 8.414E-05 | global batch size: 256 | lm loss: 2.496008E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.620 | TFLOPs: 22.78 | 31: iteration 284700/ 476837 | consumed samples: 72883200 | consumed tokens: 149264793600 | elapsed time per iteration (s): 0.77 | learning rate: 8.408E-05 | global batch size: 256 | lm loss: 2.496113E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 330.813 | TFLOPs: 20.01 | 31: iteration 284800/ 476837 | consumed samples: 72908800 | consumed tokens: 149317222400 | elapsed time per iteration (s): 0.74 | learning rate: 8.402E-05 | global batch size: 256 | lm loss: 2.493609E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 347.876 | TFLOPs: 21.05 | 31: iteration 284900/ 476837 | consumed samples: 72934400 | consumed tokens: 149369651200 | elapsed time per iteration (s): 0.68 | learning rate: 8.397E-05 | global batch size: 256 | lm loss: 2.494043E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.203 | TFLOPs: 22.70 | 31: iteration 285000/ 476837 | consumed samples: 72960000 | consumed tokens: 149422080000 | elapsed time per iteration (s): 0.68 | learning rate: 8.391E-05 | global batch size: 256 | lm loss: 2.494489E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.913 | TFLOPs: 22.80 | 31: iteration 285100/ 476837 | consumed samples: 72985600 | consumed tokens: 149474508800 | elapsed time per iteration (s): 0.68 | learning rate: 8.385E-05 | global batch size: 256 | lm loss: 2.497234E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.958 | TFLOPs: 22.81 | 31: iteration 285200/ 476837 | consumed samples: 73011200 | consumed tokens: 149526937600 | elapsed time per iteration (s): 0.68 | learning rate: 8.379E-05 | global batch size: 256 | lm loss: 2.496541E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.494 | TFLOPs: 22.78 | 31: iteration 285300/ 476837 | consumed samples: 73036800 | consumed tokens: 149579366400 | elapsed time per iteration (s): 0.68 | learning rate: 8.374E-05 | global batch size: 256 | lm loss: 2.490453E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.396 | TFLOPs: 22.77 | 31: iteration 285400/ 476837 | consumed samples: 73062400 | consumed tokens: 149631795200 | elapsed time per iteration (s): 0.68 | learning rate: 8.368E-05 | global batch size: 256 | lm loss: 2.497924E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.140 | TFLOPs: 22.76 | 31: iteration 285500/ 476837 | consumed samples: 73088000 | consumed tokens: 149684224000 | elapsed time per iteration (s): 0.68 | learning rate: 8.362E-05 | global batch size: 256 | lm loss: 2.488897E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.848 | TFLOPs: 22.80 | 31: iteration 285600/ 476837 | consumed samples: 73113600 | consumed tokens: 149736652800 | elapsed time per iteration (s): 0.68 | learning rate: 8.356E-05 | global batch size: 256 | lm loss: 2.495524E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.840 | TFLOPs: 22.80 | 31: iteration 285700/ 476837 | consumed samples: 73139200 | consumed tokens: 149789081600 | elapsed time per iteration (s): 0.68 | learning rate: 8.351E-05 | global batch size: 256 | lm loss: 2.498370E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.301 | TFLOPs: 22.77 | 31: iteration 285800/ 476837 | consumed samples: 73164800 | consumed tokens: 149841510400 | elapsed time per iteration (s): 0.68 | learning rate: 8.345E-05 | global batch size: 256 | lm loss: 2.494190E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.728 | TFLOPs: 22.73 | 31: iteration 285900/ 476837 | consumed samples: 73190400 | consumed tokens: 149893939200 | elapsed time per iteration (s): 0.68 | learning rate: 8.339E-05 | global batch size: 256 | lm loss: 2.493293E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.394 | TFLOPs: 22.77 | 0: [2023-04-28 03:59:49,112] [INFO] [logging.py:68:log_dist] [Rank 0] step=286000, skipped=0, lr=[8.333567891745073e-05, 8.333567891745073e-05, 8.333567891745073e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 286000/ 476837 | consumed samples: 73216000 | consumed tokens: 149946368000 | elapsed time per iteration (s): 0.68 | learning rate: 8.334E-05 | global batch size: 256 | lm loss: 2.491584E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.726 | TFLOPs: 22.79 | 0: steps: 286000 loss: 2.4603 iter time (s): 0.685 samples/sec: 373.533 31: iteration 286100/ 476837 | consumed samples: 73241600 | consumed tokens: 149998796800 | elapsed time per iteration (s): 0.68 | learning rate: 8.328E-05 | global batch size: 256 | lm loss: 2.497606E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.410 | TFLOPs: 22.77 | 31: iteration 286200/ 476837 | consumed samples: 73267200 | consumed tokens: 150051225600 | elapsed time per iteration (s): 0.71 | learning rate: 8.322E-05 | global batch size: 256 | lm loss: 2.491396E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.961 | TFLOPs: 21.84 | 31: iteration 286300/ 476837 | consumed samples: 73292800 | consumed tokens: 150103654400 | elapsed time per iteration (s): 0.68 | learning rate: 8.316E-05 | global batch size: 256 | lm loss: 2.493806E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.486 | TFLOPs: 22.78 | 31: iteration 286400/ 476837 | consumed samples: 73318400 | consumed tokens: 150156083200 | elapsed time per iteration (s): 0.68 | learning rate: 8.311E-05 | global batch size: 256 | lm loss: 2.497227E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.950 | TFLOPs: 22.80 | 31: iteration 286500/ 476837 | consumed samples: 73344000 | consumed tokens: 150208512000 | elapsed time per iteration (s): 0.68 | learning rate: 8.305E-05 | global batch size: 256 | lm loss: 2.493674E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.893 | TFLOPs: 22.80 | 31: iteration 286600/ 476837 | consumed samples: 73369600 | consumed tokens: 150260940800 | elapsed time per iteration (s): 0.68 | learning rate: 8.299E-05 | global batch size: 256 | lm loss: 2.494712E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.759 | TFLOPs: 22.79 | 31: iteration 286700/ 476837 | consumed samples: 73395200 | consumed tokens: 150313369600 | elapsed time per iteration (s): 0.68 | learning rate: 8.294E-05 | global batch size: 256 | lm loss: 2.492861E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.661 | TFLOPs: 22.73 | 31: iteration 286800/ 476837 | consumed samples: 73420800 | consumed tokens: 150365798400 | elapsed time per iteration (s): 0.68 | learning rate: 8.288E-05 | global batch size: 256 | lm loss: 2.493220E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.782 | TFLOPs: 22.73 | 31: iteration 286900/ 476837 | consumed samples: 73446400 | consumed tokens: 150418227200 | elapsed time per iteration (s): 0.68 | learning rate: 8.282E-05 | global batch size: 256 | lm loss: 2.492153E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.119 | TFLOPs: 22.75 | 31: iteration 287000/ 476837 | consumed samples: 73472000 | consumed tokens: 150470656000 | elapsed time per iteration (s): 0.68 | learning rate: 8.276E-05 | global batch size: 256 | lm loss: 2.497528E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.766 | TFLOPs: 22.67 | 31: iteration 287100/ 476837 | consumed samples: 73497600 | consumed tokens: 150523084800 | elapsed time per iteration (s): 0.69 | learning rate: 8.271E-05 | global batch size: 256 | lm loss: 2.493496E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.360 | TFLOPs: 22.41 | 31: iteration 287200/ 476837 | consumed samples: 73523200 | consumed tokens: 150575513600 | elapsed time per iteration (s): 0.68 | learning rate: 8.265E-05 | global batch size: 256 | lm loss: 2.492053E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.922 | TFLOPs: 22.68 | 31: iteration 287300/ 476837 | consumed samples: 73548800 | consumed tokens: 150627942400 | elapsed time per iteration (s): 0.68 | learning rate: 8.259E-05 | global batch size: 256 | lm loss: 2.489321E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.197 | TFLOPs: 22.70 | 31: iteration 287400/ 476837 | consumed samples: 73574400 | consumed tokens: 150680371200 | elapsed time per iteration (s): 0.68 | learning rate: 8.254E-05 | global batch size: 256 | lm loss: 2.496006E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.330 | TFLOPs: 22.71 | 31: iteration 287500/ 476837 | consumed samples: 73600000 | consumed tokens: 150732800000 | elapsed time per iteration (s): 0.82 | learning rate: 8.248E-05 | global batch size: 256 | lm loss: 2.487596E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 313.693 | TFLOPs: 18.98 | 31: iteration 287600/ 476837 | consumed samples: 73625600 | consumed tokens: 150785228800 | elapsed time per iteration (s): 0.70 | learning rate: 8.242E-05 | global batch size: 256 | lm loss: 2.494145E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.708 | TFLOPs: 22.18 | 31: iteration 287700/ 476837 | consumed samples: 73651200 | consumed tokens: 150837657600 | elapsed time per iteration (s): 0.68 | learning rate: 8.236E-05 | global batch size: 256 | lm loss: 2.491303E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.113 | TFLOPs: 22.75 | 31: iteration 287800/ 476837 | consumed samples: 73676800 | consumed tokens: 150890086400 | elapsed time per iteration (s): 0.68 | learning rate: 8.231E-05 | global batch size: 256 | lm loss: 2.495749E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.853 | TFLOPs: 22.74 | 31: iteration 287900/ 476837 | consumed samples: 73702400 | consumed tokens: 150942515200 | elapsed time per iteration (s): 0.68 | learning rate: 8.225E-05 | global batch size: 256 | lm loss: 2.490219E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.220 | TFLOPs: 22.70 | 0: [2023-04-28 04:22:50,050] [INFO] [logging.py:68:log_dist] [Rank 0] step=288000, skipped=0, lr=[8.219396438503372e-05, 8.219396438503372e-05, 8.219396438503372e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 288000/ 476837 | consumed samples: 73728000 | consumed tokens: 150994944000 | elapsed time per iteration (s): 0.68 | learning rate: 8.219E-05 | global batch size: 256 | lm loss: 2.490862E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.939 | TFLOPs: 22.80 | 0: steps: 288000 loss: 2.4569 iter time (s): 0.687 samples/sec: 372.526 31: iteration 288100/ 476837 | consumed samples: 73753600 | consumed tokens: 151047372800 | elapsed time per iteration (s): 0.68 | learning rate: 8.214E-05 | global batch size: 256 | lm loss: 2.489252E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.558 | TFLOPs: 22.78 | 31: iteration 288200/ 476837 | consumed samples: 73779200 | consumed tokens: 151099801600 | elapsed time per iteration (s): 0.68 | learning rate: 8.208E-05 | global batch size: 256 | lm loss: 2.496125E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.628 | TFLOPs: 22.79 | 31: iteration 288300/ 476837 | consumed samples: 73804800 | consumed tokens: 151152230400 | elapsed time per iteration (s): 0.92 | learning rate: 8.202E-05 | global batch size: 256 | lm loss: 2.492443E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 279.135 | TFLOPs: 16.89 | 31: iteration 288400/ 476837 | consumed samples: 73830400 | consumed tokens: 151204659200 | elapsed time per iteration (s): 0.75 | learning rate: 8.197E-05 | global batch size: 256 | lm loss: 2.496540E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 340.582 | TFLOPs: 20.60 | 31: iteration 288500/ 476837 | consumed samples: 73856000 | consumed tokens: 151257088000 | elapsed time per iteration (s): 0.68 | learning rate: 8.191E-05 | global batch size: 256 | lm loss: 2.488594E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.681 | TFLOPs: 22.79 | 31: iteration 288600/ 476837 | consumed samples: 73881600 | consumed tokens: 151309516800 | elapsed time per iteration (s): 0.68 | learning rate: 8.185E-05 | global batch size: 256 | lm loss: 2.495157E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.852 | TFLOPs: 22.80 | 31: iteration 288700/ 476837 | consumed samples: 73907200 | consumed tokens: 151361945600 | elapsed time per iteration (s): 0.68 | learning rate: 8.180E-05 | global batch size: 256 | lm loss: 2.490587E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.920 | TFLOPs: 22.74 | 31: iteration 288800/ 476837 | consumed samples: 73932800 | consumed tokens: 151414374400 | elapsed time per iteration (s): 0.68 | learning rate: 8.174E-05 | global batch size: 256 | lm loss: 2.492618E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.180 | TFLOPs: 22.76 | 31: iteration 288900/ 476837 | consumed samples: 73958400 | consumed tokens: 151466803200 | elapsed time per iteration (s): 0.68 | learning rate: 8.168E-05 | global batch size: 256 | lm loss: 2.494480E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.922 | TFLOPs: 22.68 | 31: iteration 289000/ 476837 | consumed samples: 73984000 | consumed tokens: 151519232000 | elapsed time per iteration (s): 0.68 | learning rate: 8.162E-05 | global batch size: 256 | lm loss: 2.494146E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.880 | TFLOPs: 22.80 | 31: iteration 289100/ 476837 | consumed samples: 74009600 | consumed tokens: 151571660800 | elapsed time per iteration (s): 0.68 | learning rate: 8.157E-05 | global batch size: 256 | lm loss: 2.490961E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.569 | TFLOPs: 22.78 | 31: iteration 289200/ 476837 | consumed samples: 74035200 | consumed tokens: 151624089600 | elapsed time per iteration (s): 0.68 | learning rate: 8.151E-05 | global batch size: 256 | lm loss: 2.492001E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.253 | TFLOPs: 22.64 | 31: iteration 289300/ 476837 | consumed samples: 74060800 | consumed tokens: 151676518400 | elapsed time per iteration (s): 0.68 | learning rate: 8.145E-05 | global batch size: 256 | lm loss: 2.493385E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.631 | TFLOPs: 22.79 | 31: iteration 289400/ 476837 | consumed samples: 74086400 | consumed tokens: 151728947200 | elapsed time per iteration (s): 0.68 | learning rate: 8.140E-05 | global batch size: 256 | lm loss: 2.495151E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.573 | TFLOPs: 22.78 | 31: iteration 289500/ 476837 | consumed samples: 74112000 | consumed tokens: 151781376000 | elapsed time per iteration (s): 0.68 | learning rate: 8.134E-05 | global batch size: 256 | lm loss: 2.489414E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.616 | TFLOPs: 22.78 | 31: iteration 289600/ 476837 | consumed samples: 74137600 | consumed tokens: 151833804800 | elapsed time per iteration (s): 0.68 | learning rate: 8.128E-05 | global batch size: 256 | lm loss: 2.485630E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.718 | TFLOPs: 22.79 | 31: iteration 289700/ 476837 | consumed samples: 74163200 | consumed tokens: 151886233600 | elapsed time per iteration (s): 0.68 | learning rate: 8.123E-05 | global batch size: 256 | lm loss: 2.490330E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.775 | TFLOPs: 22.79 | 31: iteration 289800/ 476837 | consumed samples: 74188800 | consumed tokens: 151938662400 | elapsed time per iteration (s): 0.68 | learning rate: 8.117E-05 | global batch size: 256 | lm loss: 2.493333E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.783 | TFLOPs: 22.79 | 31: iteration 289900/ 476837 | consumed samples: 74214400 | consumed tokens: 151991091200 | elapsed time per iteration (s): 0.68 | learning rate: 8.111E-05 | global batch size: 256 | lm loss: 2.490704E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.723 | TFLOPs: 22.79 | 0: [2023-04-28 04:46:01,422] [INFO] [logging.py:68:log_dist] [Rank 0] step=290000, skipped=0, lr=[8.105717570989228e-05, 8.105717570989228e-05, 8.105717570989228e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 290000/ 476837 | consumed samples: 74240000 | consumed tokens: 152043520000 | elapsed time per iteration (s): 0.68 | learning rate: 8.106E-05 | global batch size: 256 | lm loss: 2.491704E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.499 | TFLOPs: 22.72 | 0: steps: 290000 loss: 2.4592 iter time (s): 0.692 samples/sec: 370.150 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 290000 | lm loss value: 2.898096E+00 | lm loss PPL: 1.813957E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 290100/ 476837 | consumed samples: 74265600 | consumed tokens: 152095948800 | elapsed time per iteration (s): 0.68 | learning rate: 8.100E-05 | global batch size: 256 | lm loss: 2.494862E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.739 | TFLOPs: 22.67 | 31: iteration 290200/ 476837 | consumed samples: 74291200 | consumed tokens: 152148377600 | elapsed time per iteration (s): 0.68 | learning rate: 8.094E-05 | global batch size: 256 | lm loss: 2.494314E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.501 | TFLOPs: 22.78 | 31: iteration 290300/ 476837 | consumed samples: 74316800 | consumed tokens: 152200806400 | elapsed time per iteration (s): 0.80 | learning rate: 8.089E-05 | global batch size: 256 | lm loss: 2.489649E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 318.200 | TFLOPs: 19.25 | 31: iteration 290400/ 476837 | consumed samples: 74342400 | consumed tokens: 152253235200 | elapsed time per iteration (s): 0.73 | learning rate: 8.083E-05 | global batch size: 256 | lm loss: 2.487439E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 351.401 | TFLOPs: 21.26 | 31: iteration 290500/ 476837 | consumed samples: 74368000 | consumed tokens: 152305664000 | elapsed time per iteration (s): 0.68 | learning rate: 8.077E-05 | global batch size: 256 | lm loss: 2.493925E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.161 | TFLOPs: 22.76 | 31: iteration 290600/ 476837 | consumed samples: 74393600 | consumed tokens: 152358092800 | elapsed time per iteration (s): 0.68 | learning rate: 8.072E-05 | global batch size: 256 | lm loss: 2.488261E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.328 | TFLOPs: 22.77 | 31: iteration 290700/ 476837 | consumed samples: 74419200 | consumed tokens: 152410521600 | elapsed time per iteration (s): 0.68 | learning rate: 8.066E-05 | global batch size: 256 | lm loss: 2.488654E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.930 | TFLOPs: 22.80 | 31: iteration 290800/ 476837 | consumed samples: 74444800 | consumed tokens: 152462950400 | elapsed time per iteration (s): 0.68 | learning rate: 8.060E-05 | global batch size: 256 | lm loss: 2.496763E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.082 | TFLOPs: 22.75 | 31: iteration 290900/ 476837 | consumed samples: 74470400 | consumed tokens: 152515379200 | elapsed time per iteration (s): 0.68 | learning rate: 8.055E-05 | global batch size: 256 | lm loss: 2.490599E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.273 | TFLOPs: 22.76 | 31: iteration 291000/ 476837 | consumed samples: 74496000 | consumed tokens: 152567808000 | elapsed time per iteration (s): 0.68 | learning rate: 8.049E-05 | global batch size: 256 | lm loss: 2.487867E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.919 | TFLOPs: 22.80 | 31: iteration 291100/ 476837 | consumed samples: 74521600 | consumed tokens: 152620236800 | elapsed time per iteration (s): 0.68 | learning rate: 8.043E-05 | global batch size: 256 | lm loss: 2.490065E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.894 | TFLOPs: 22.80 | 31: iteration 291200/ 476837 | consumed samples: 74547200 | consumed tokens: 152672665600 | elapsed time per iteration (s): 0.68 | learning rate: 8.038E-05 | global batch size: 256 | lm loss: 2.488944E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.912 | TFLOPs: 22.80 | 31: iteration 291300/ 476837 | consumed samples: 74572800 | consumed tokens: 152725094400 | elapsed time per iteration (s): 0.68 | learning rate: 8.032E-05 | global batch size: 256 | lm loss: 2.493601E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.226 | TFLOPs: 22.76 | 31: iteration 291400/ 476837 | consumed samples: 74598400 | consumed tokens: 152777523200 | elapsed time per iteration (s): 0.71 | learning rate: 8.026E-05 | global batch size: 256 | lm loss: 2.490140E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.204 | TFLOPs: 21.85 | 31: iteration 291500/ 476837 | consumed samples: 74624000 | consumed tokens: 152829952000 | elapsed time per iteration (s): 0.68 | learning rate: 8.021E-05 | global batch size: 256 | lm loss: 2.488601E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.887 | TFLOPs: 22.74 | 31: iteration 291600/ 476837 | consumed samples: 74649600 | consumed tokens: 152882380800 | elapsed time per iteration (s): 0.68 | learning rate: 8.015E-05 | global batch size: 256 | lm loss: 2.489960E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.778 | TFLOPs: 22.79 | 31: iteration 291700/ 476837 | consumed samples: 74675200 | consumed tokens: 152934809600 | elapsed time per iteration (s): 0.68 | learning rate: 8.009E-05 | global batch size: 256 | lm loss: 2.491076E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.290 | TFLOPs: 22.76 | 31: iteration 291800/ 476837 | consumed samples: 74700800 | consumed tokens: 152987238400 | elapsed time per iteration (s): 0.68 | learning rate: 8.004E-05 | global batch size: 256 | lm loss: 2.490574E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.659 | TFLOPs: 22.79 | 31: iteration 291900/ 476837 | consumed samples: 74726400 | consumed tokens: 153039667200 | elapsed time per iteration (s): 0.68 | learning rate: 7.998E-05 | global batch size: 256 | lm loss: 2.488353E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.321 | TFLOPs: 22.77 | 0: [2023-04-28 05:09:01,843] [INFO] [logging.py:68:log_dist] [Rank 0] step=292000, skipped=0, lr=[7.992551427487878e-05, 7.992551427487878e-05, 7.992551427487878e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 292000/ 476837 | consumed samples: 74752000 | consumed tokens: 153092096000 | elapsed time per iteration (s): 0.68 | learning rate: 7.993E-05 | global batch size: 256 | lm loss: 2.487915E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.760 | TFLOPs: 22.79 | 0: steps: 292000 loss: 2.5006 iter time (s): 0.687 samples/sec: 372.749 31: iteration 292100/ 476837 | consumed samples: 74777600 | consumed tokens: 153144524800 | elapsed time per iteration (s): 0.68 | learning rate: 7.987E-05 | global batch size: 256 | lm loss: 2.491483E+00 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.706 | TFLOPs: 22.79 | 31: iteration 292200/ 476837 | consumed samples: 74803200 | consumed tokens: 153196953600 | elapsed time per iteration (s): 0.68 | learning rate: 7.981E-05 | global batch size: 256 | lm loss: 2.488257E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.765 | TFLOPs: 22.79 | 31: iteration 292300/ 476837 | consumed samples: 74828800 | consumed tokens: 153249382400 | elapsed time per iteration (s): 0.68 | learning rate: 7.976E-05 | global batch size: 256 | lm loss: 2.491349E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.798 | TFLOPs: 22.80 | 31: iteration 292400/ 476837 | consumed samples: 74854400 | consumed tokens: 153301811200 | elapsed time per iteration (s): 0.68 | learning rate: 7.970E-05 | global batch size: 256 | lm loss: 2.489425E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.193 | TFLOPs: 22.76 | 31: iteration 292500/ 476837 | consumed samples: 74880000 | consumed tokens: 153354240000 | elapsed time per iteration (s): 0.68 | learning rate: 7.964E-05 | global batch size: 256 | lm loss: 2.490578E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.903 | TFLOPs: 22.74 | 31: iteration 292600/ 476837 | consumed samples: 74905600 | consumed tokens: 153406668800 | elapsed time per iteration (s): 0.68 | learning rate: 7.959E-05 | global batch size: 256 | lm loss: 2.490486E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.010 | TFLOPs: 22.69 | 31: iteration 292700/ 476837 | consumed samples: 74931200 | consumed tokens: 153459097600 | elapsed time per iteration (s): 0.69 | learning rate: 7.953E-05 | global batch size: 256 | lm loss: 2.485432E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.623 | TFLOPs: 22.60 | 31: iteration 292800/ 476837 | consumed samples: 74956800 | consumed tokens: 153511526400 | elapsed time per iteration (s): 0.68 | learning rate: 7.947E-05 | global batch size: 256 | lm loss: 2.487173E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.808 | TFLOPs: 22.67 | 31: iteration 292900/ 476837 | consumed samples: 74982400 | consumed tokens: 153563955200 | elapsed time per iteration (s): 0.68 | learning rate: 7.942E-05 | global batch size: 256 | lm loss: 2.491443E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.872 | TFLOPs: 22.68 | 31: iteration 293000/ 476837 | consumed samples: 75008000 | consumed tokens: 153616384000 | elapsed time per iteration (s): 0.68 | learning rate: 7.936E-05 | global batch size: 256 | lm loss: 2.484709E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.348 | TFLOPs: 22.65 | 31: iteration 293100/ 476837 | consumed samples: 75033600 | consumed tokens: 153668812800 | elapsed time per iteration (s): 0.71 | learning rate: 7.931E-05 | global batch size: 256 | lm loss: 2.489646E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.564 | TFLOPs: 21.87 | 31: iteration 293200/ 476837 | consumed samples: 75059200 | consumed tokens: 153721241600 | elapsed time per iteration (s): 0.84 | learning rate: 7.925E-05 | global batch size: 256 | lm loss: 2.487259E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 305.685 | TFLOPs: 18.49 | 31: iteration 293300/ 476837 | consumed samples: 75084800 | consumed tokens: 153773670400 | elapsed time per iteration (s): 0.68 | learning rate: 7.919E-05 | global batch size: 256 | lm loss: 2.486241E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.718 | TFLOPs: 22.67 | 31: iteration 293400/ 476837 | consumed samples: 75110400 | consumed tokens: 153826099200 | elapsed time per iteration (s): 0.71 | learning rate: 7.914E-05 | global batch size: 256 | lm loss: 2.484464E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 362.660 | TFLOPs: 21.94 | 31: iteration 293500/ 476837 | consumed samples: 75136000 | consumed tokens: 153878528000 | elapsed time per iteration (s): 0.68 | learning rate: 7.908E-05 | global batch size: 256 | lm loss: 2.488971E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.299 | TFLOPs: 22.77 | 31: iteration 293600/ 476837 | consumed samples: 75161600 | consumed tokens: 153930956800 | elapsed time per iteration (s): 0.68 | learning rate: 7.902E-05 | global batch size: 256 | lm loss: 2.490306E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.641 | TFLOPs: 22.79 | 31: iteration 293700/ 476837 | consumed samples: 75187200 | consumed tokens: 153983385600 | elapsed time per iteration (s): 0.68 | learning rate: 7.897E-05 | global batch size: 256 | lm loss: 2.484506E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.513 | TFLOPs: 22.78 | 31: iteration 293800/ 476837 | consumed samples: 75212800 | consumed tokens: 154035814400 | elapsed time per iteration (s): 0.68 | learning rate: 7.891E-05 | global batch size: 256 | lm loss: 2.484245E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.865 | TFLOPs: 22.74 | 31: iteration 293900/ 476837 | consumed samples: 75238400 | consumed tokens: 154088243200 | elapsed time per iteration (s): 0.68 | learning rate: 7.886E-05 | global batch size: 256 | lm loss: 2.482384E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.214 | TFLOPs: 22.76 | 0: [2023-04-28 05:32:06,131] [INFO] [logging.py:68:log_dist] [Rank 0] step=294000, skipped=0, lr=[7.879918055455173e-05, 7.879918055455173e-05, 7.879918055455173e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 294000/ 476837 | consumed samples: 75264000 | consumed tokens: 154140672000 | elapsed time per iteration (s): 0.69 | learning rate: 7.880E-05 | global batch size: 256 | lm loss: 2.485328E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.420 | TFLOPs: 22.47 | 0: steps: 294000 loss: 2.4831 iter time (s): 0.689 samples/sec: 371.655 31: iteration 294100/ 476837 | consumed samples: 75289600 | consumed tokens: 154193100800 | elapsed time per iteration (s): 0.68 | learning rate: 7.874E-05 | global batch size: 256 | lm loss: 2.487419E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.404 | TFLOPs: 22.77 | 31: iteration 294200/ 476837 | consumed samples: 75315200 | consumed tokens: 154245529600 | elapsed time per iteration (s): 0.68 | learning rate: 7.869E-05 | global batch size: 256 | lm loss: 2.487370E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.722 | TFLOPs: 22.79 | 31: iteration 294300/ 476837 | consumed samples: 75340800 | consumed tokens: 154297958400 | elapsed time per iteration (s): 0.68 | learning rate: 7.863E-05 | global batch size: 256 | lm loss: 2.487550E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.361 | TFLOPs: 22.77 | 31: iteration 294400/ 476837 | consumed samples: 75366400 | consumed tokens: 154350387200 | elapsed time per iteration (s): 0.68 | learning rate: 7.857E-05 | global batch size: 256 | lm loss: 2.486752E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.006 | TFLOPs: 22.63 | 31: iteration 294500/ 476837 | consumed samples: 75392000 | consumed tokens: 154402816000 | elapsed time per iteration (s): 0.68 | learning rate: 7.852E-05 | global batch size: 256 | lm loss: 2.488412E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.135 | TFLOPs: 22.76 | 31: iteration 294600/ 476837 | consumed samples: 75417600 | consumed tokens: 154455244800 | elapsed time per iteration (s): 0.68 | learning rate: 7.846E-05 | global batch size: 256 | lm loss: 2.482803E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.047 | TFLOPs: 22.75 | 31: iteration 294700/ 476837 | consumed samples: 75443200 | consumed tokens: 154507673600 | elapsed time per iteration (s): 0.68 | learning rate: 7.841E-05 | global batch size: 256 | lm loss: 2.482775E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.927 | TFLOPs: 22.80 | 31: iteration 294800/ 476837 | consumed samples: 75468800 | consumed tokens: 154560102400 | elapsed time per iteration (s): 0.68 | learning rate: 7.835E-05 | global batch size: 256 | lm loss: 2.489238E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.130 | TFLOPs: 22.75 | 31: iteration 294900/ 476837 | consumed samples: 75494400 | consumed tokens: 154612531200 | elapsed time per iteration (s): 0.68 | learning rate: 7.829E-05 | global batch size: 256 | lm loss: 2.488056E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.032 | TFLOPs: 22.69 | 31: iteration 295000/ 476837 | consumed samples: 75520000 | consumed tokens: 154664960000 | elapsed time per iteration (s): 0.68 | learning rate: 7.824E-05 | global batch size: 256 | lm loss: 2.492820E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.406 | TFLOPs: 22.77 | 31: iteration 295100/ 476837 | consumed samples: 75545600 | consumed tokens: 154717388800 | elapsed time per iteration (s): 0.68 | learning rate: 7.818E-05 | global batch size: 256 | lm loss: 2.487020E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.944 | TFLOPs: 22.80 | 31: iteration 295200/ 476837 | consumed samples: 75571200 | consumed tokens: 154769817600 | elapsed time per iteration (s): 0.68 | learning rate: 7.813E-05 | global batch size: 256 | lm loss: 2.487046E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.940 | TFLOPs: 22.80 | 31: iteration 295300/ 476837 | consumed samples: 75596800 | consumed tokens: 154822246400 | elapsed time per iteration (s): 0.68 | learning rate: 7.807E-05 | global batch size: 256 | lm loss: 2.485797E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.009 | TFLOPs: 22.81 | 31: iteration 295400/ 476837 | consumed samples: 75622400 | consumed tokens: 154874675200 | elapsed time per iteration (s): 0.68 | learning rate: 7.801E-05 | global batch size: 256 | lm loss: 2.482711E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.998 | TFLOPs: 22.81 | 31: iteration 295500/ 476837 | consumed samples: 75648000 | consumed tokens: 154927104000 | elapsed time per iteration (s): 0.68 | learning rate: 7.796E-05 | global batch size: 256 | lm loss: 2.484609E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.146 | TFLOPs: 22.76 | 31: iteration 295600/ 476837 | consumed samples: 75673600 | consumed tokens: 154979532800 | elapsed time per iteration (s): 0.68 | learning rate: 7.790E-05 | global batch size: 256 | lm loss: 2.487596E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.030 | TFLOPs: 22.81 | 31: iteration 295700/ 476837 | consumed samples: 75699200 | consumed tokens: 155031961600 | elapsed time per iteration (s): 0.68 | learning rate: 7.785E-05 | global batch size: 256 | lm loss: 2.479478E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.763 | TFLOPs: 22.79 | 31: iteration 295800/ 476837 | consumed samples: 75724800 | consumed tokens: 155084390400 | elapsed time per iteration (s): 0.68 | learning rate: 7.779E-05 | global batch size: 256 | lm loss: 2.487783E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.460 | TFLOPs: 22.71 | 31: iteration 295900/ 476837 | consumed samples: 75750400 | consumed tokens: 155136819200 | elapsed time per iteration (s): 0.68 | learning rate: 7.773E-05 | global batch size: 256 | lm loss: 2.483974E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.571 | TFLOPs: 22.78 | 0: [2023-04-28 05:54:58,160] [INFO] [logging.py:68:log_dist] [Rank 0] step=296000, skipped=0, lr=[7.767837407966149e-05, 7.767837407966149e-05, 7.767837407966149e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 296000/ 476837 | consumed samples: 75776000 | consumed tokens: 155189248000 | elapsed time per iteration (s): 0.79 | learning rate: 7.768E-05 | global batch size: 256 | lm loss: 2.485037E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 322.038 | TFLOPs: 19.48 | 0: steps: 296000 loss: 2.5048 iter time (s): 0.683 samples/sec: 375.069 31: iteration 296100/ 476837 | consumed samples: 75801600 | consumed tokens: 155241676800 | elapsed time per iteration (s): 0.76 | learning rate: 7.762E-05 | global batch size: 256 | lm loss: 2.485051E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 338.887 | TFLOPs: 20.50 | 31: iteration 296200/ 476837 | consumed samples: 75827200 | consumed tokens: 155294105600 | elapsed time per iteration (s): 0.68 | learning rate: 7.757E-05 | global batch size: 256 | lm loss: 2.484490E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.582 | TFLOPs: 22.78 | 31: iteration 296300/ 476837 | consumed samples: 75852800 | consumed tokens: 155346534400 | elapsed time per iteration (s): 0.68 | learning rate: 7.751E-05 | global batch size: 256 | lm loss: 2.484496E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.921 | TFLOPs: 22.80 | 31: iteration 296400/ 476837 | consumed samples: 75878400 | consumed tokens: 155398963200 | elapsed time per iteration (s): 0.68 | learning rate: 7.745E-05 | global batch size: 256 | lm loss: 2.484624E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.990 | TFLOPs: 22.81 | 31: iteration 296500/ 476837 | consumed samples: 75904000 | consumed tokens: 155451392000 | elapsed time per iteration (s): 0.68 | learning rate: 7.740E-05 | global batch size: 256 | lm loss: 2.489686E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.402 | TFLOPs: 22.77 | 31: iteration 296600/ 476837 | consumed samples: 75929600 | consumed tokens: 155503820800 | elapsed time per iteration (s): 0.68 | learning rate: 7.734E-05 | global batch size: 256 | lm loss: 2.483750E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.796 | TFLOPs: 22.80 | 31: iteration 296700/ 476837 | consumed samples: 75955200 | consumed tokens: 155556249600 | elapsed time per iteration (s): 0.68 | learning rate: 7.729E-05 | global batch size: 256 | lm loss: 2.485603E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.019 | TFLOPs: 22.81 | 31: iteration 296800/ 476837 | consumed samples: 75980800 | consumed tokens: 155608678400 | elapsed time per iteration (s): 0.68 | learning rate: 7.723E-05 | global batch size: 256 | lm loss: 2.483046E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.757 | TFLOPs: 22.79 | 31: iteration 296900/ 476837 | consumed samples: 76006400 | consumed tokens: 155661107200 | elapsed time per iteration (s): 0.68 | learning rate: 7.718E-05 | global batch size: 256 | lm loss: 2.481774E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.060 | TFLOPs: 22.81 | 31: iteration 297000/ 476837 | consumed samples: 76032000 | consumed tokens: 155713536000 | elapsed time per iteration (s): 0.68 | learning rate: 7.712E-05 | global batch size: 256 | lm loss: 2.485918E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.589 | TFLOPs: 22.78 | 31: iteration 297100/ 476837 | consumed samples: 76057600 | consumed tokens: 155765964800 | elapsed time per iteration (s): 0.68 | learning rate: 7.706E-05 | global batch size: 256 | lm loss: 2.486736E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.209 | TFLOPs: 22.76 | 31: iteration 297200/ 476837 | consumed samples: 76083200 | consumed tokens: 155818393600 | elapsed time per iteration (s): 0.68 | learning rate: 7.701E-05 | global batch size: 256 | lm loss: 2.485309E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.252 | TFLOPs: 22.76 | 31: iteration 297300/ 476837 | consumed samples: 76108800 | consumed tokens: 155870822400 | elapsed time per iteration (s): 0.68 | learning rate: 7.695E-05 | global batch size: 256 | lm loss: 2.484396E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.848 | TFLOPs: 22.80 | 31: iteration 297400/ 476837 | consumed samples: 76134400 | consumed tokens: 155923251200 | elapsed time per iteration (s): 0.68 | learning rate: 7.690E-05 | global batch size: 256 | lm loss: 2.480843E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.641 | TFLOPs: 22.79 | 31: iteration 297500/ 476837 | consumed samples: 76160000 | consumed tokens: 155975680000 | elapsed time per iteration (s): 0.68 | learning rate: 7.684E-05 | global batch size: 256 | lm loss: 2.483856E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.235 | TFLOPs: 22.76 | 31: iteration 297600/ 476837 | consumed samples: 76185600 | consumed tokens: 156028108800 | elapsed time per iteration (s): 0.68 | learning rate: 7.679E-05 | global batch size: 256 | lm loss: 2.482327E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.621 | TFLOPs: 22.78 | 31: iteration 297700/ 476837 | consumed samples: 76211200 | consumed tokens: 156080537600 | elapsed time per iteration (s): 0.68 | learning rate: 7.673E-05 | global batch size: 256 | lm loss: 2.484500E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.718 | TFLOPs: 22.79 | 31: iteration 297800/ 476837 | consumed samples: 76236800 | consumed tokens: 156132966400 | elapsed time per iteration (s): 0.68 | learning rate: 7.667E-05 | global batch size: 256 | lm loss: 2.477485E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.725 | TFLOPs: 22.79 | 31: iteration 297900/ 476837 | consumed samples: 76262400 | consumed tokens: 156185395200 | elapsed time per iteration (s): 0.68 | learning rate: 7.662E-05 | global batch size: 256 | lm loss: 2.479521E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.761 | TFLOPs: 22.79 | 0: [2023-04-28 06:17:45,000] [INFO] [logging.py:68:log_dist] [Rank 0] step=298000, skipped=0, lr=[7.656329340180335e-05, 7.656329340180335e-05, 7.656329340180335e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 298000/ 476837 | consumed samples: 76288000 | consumed tokens: 156237824000 | elapsed time per iteration (s): 0.68 | learning rate: 7.656E-05 | global batch size: 256 | lm loss: 2.481695E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.689 | TFLOPs: 22.79 | 0: steps: 298000 loss: 2.4848 iter time (s): 0.680 samples/sec: 376.394 31: iteration 298100/ 476837 | consumed samples: 76313600 | consumed tokens: 156290252800 | elapsed time per iteration (s): 0.68 | learning rate: 7.651E-05 | global batch size: 256 | lm loss: 2.485428E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.877 | TFLOPs: 22.74 | 31: iteration 298200/ 476837 | consumed samples: 76339200 | consumed tokens: 156342681600 | elapsed time per iteration (s): 0.68 | learning rate: 7.645E-05 | global batch size: 256 | lm loss: 2.486001E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.656 | TFLOPs: 22.79 | 31: iteration 298300/ 476837 | consumed samples: 76364800 | consumed tokens: 156395110400 | elapsed time per iteration (s): 0.68 | learning rate: 7.640E-05 | global batch size: 256 | lm loss: 2.483781E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.783 | TFLOPs: 22.67 | 31: iteration 298400/ 476837 | consumed samples: 76390400 | consumed tokens: 156447539200 | elapsed time per iteration (s): 0.68 | learning rate: 7.634E-05 | global batch size: 256 | lm loss: 2.479980E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.578 | TFLOPs: 22.72 | 31: iteration 298500/ 476837 | consumed samples: 76416000 | consumed tokens: 156499968000 | elapsed time per iteration (s): 0.68 | learning rate: 7.629E-05 | global batch size: 256 | lm loss: 2.481978E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.317 | TFLOPs: 22.71 | 31: iteration 298600/ 476837 | consumed samples: 76441600 | consumed tokens: 156552396800 | elapsed time per iteration (s): 0.68 | learning rate: 7.623E-05 | global batch size: 256 | lm loss: 2.485779E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.763 | TFLOPs: 22.73 | 31: iteration 298700/ 476837 | consumed samples: 76467200 | consumed tokens: 156604825600 | elapsed time per iteration (s): 0.68 | learning rate: 7.617E-05 | global batch size: 256 | lm loss: 2.484584E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.234 | TFLOPs: 22.70 | 31: iteration 298800/ 476837 | consumed samples: 76492800 | consumed tokens: 156657254400 | elapsed time per iteration (s): 0.68 | learning rate: 7.612E-05 | global batch size: 256 | lm loss: 2.482249E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.317 | TFLOPs: 22.71 | 31: iteration 298900/ 476837 | consumed samples: 76518400 | consumed tokens: 156709683200 | elapsed time per iteration (s): 0.79 | learning rate: 7.606E-05 | global batch size: 256 | lm loss: 2.480636E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 325.897 | TFLOPs: 19.72 | 31: iteration 299000/ 476837 | consumed samples: 76544000 | consumed tokens: 156762112000 | elapsed time per iteration (s): 0.76 | learning rate: 7.601E-05 | global batch size: 256 | lm loss: 2.480056E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 334.738 | TFLOPs: 20.25 | 31: iteration 299100/ 476837 | consumed samples: 76569600 | consumed tokens: 156814540800 | elapsed time per iteration (s): 0.68 | learning rate: 7.595E-05 | global batch size: 256 | lm loss: 2.482770E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.097 | TFLOPs: 22.75 | 31: iteration 299200/ 476837 | consumed samples: 76595200 | consumed tokens: 156866969600 | elapsed time per iteration (s): 0.68 | learning rate: 7.590E-05 | global batch size: 256 | lm loss: 2.486993E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.323 | TFLOPs: 22.77 | 31: iteration 299300/ 476837 | consumed samples: 76620800 | consumed tokens: 156919398400 | elapsed time per iteration (s): 0.68 | learning rate: 7.584E-05 | global batch size: 256 | lm loss: 2.475341E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.031 | TFLOPs: 22.75 | 31: iteration 299400/ 476837 | consumed samples: 76646400 | consumed tokens: 156971827200 | elapsed time per iteration (s): 0.68 | learning rate: 7.579E-05 | global batch size: 256 | lm loss: 2.479285E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.056 | TFLOPs: 22.75 | 31: iteration 299500/ 476837 | consumed samples: 76672000 | consumed tokens: 157024256000 | elapsed time per iteration (s): 0.68 | learning rate: 7.573E-05 | global batch size: 256 | lm loss: 2.486019E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.564 | TFLOPs: 22.78 | 31: iteration 299600/ 476837 | consumed samples: 76697600 | consumed tokens: 157076684800 | elapsed time per iteration (s): 0.68 | learning rate: 7.568E-05 | global batch size: 256 | lm loss: 2.480735E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.371 | TFLOPs: 22.77 | 31: iteration 299700/ 476837 | consumed samples: 76723200 | consumed tokens: 157129113600 | elapsed time per iteration (s): 0.68 | learning rate: 7.562E-05 | global batch size: 256 | lm loss: 2.481003E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.721 | TFLOPs: 22.79 | 31: iteration 299800/ 476837 | consumed samples: 76748800 | consumed tokens: 157181542400 | elapsed time per iteration (s): 0.68 | learning rate: 7.556E-05 | global batch size: 256 | lm loss: 2.481638E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.732 | TFLOPs: 22.79 | 31: iteration 299900/ 476837 | consumed samples: 76774400 | consumed tokens: 157233971200 | elapsed time per iteration (s): 0.68 | learning rate: 7.551E-05 | global batch size: 256 | lm loss: 2.485119E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.712 | TFLOPs: 22.79 | 0: [2023-04-28 06:40:45,476] [INFO] [logging.py:68:log_dist] [Rank 0] step=300000, skipped=0, lr=[7.545413605824381e-05, 7.545413605824381e-05, 7.545413605824381e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 300000/ 476837 | consumed samples: 76800000 | consumed tokens: 157286400000 | elapsed time per iteration (s): 0.68 | learning rate: 7.545E-05 | global batch size: 256 | lm loss: 2.480271E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.369 | TFLOPs: 22.77 | 0: steps: 300000 loss: 2.4780 iter time (s): 0.687 samples/sec: 372.719 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 300000 | lm loss value: 2.953350E+00 | lm loss PPL: 1.917007E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 300000 to checkpoints_1b1250b1b5 0: [2023-04-28 06:40:45,741] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step300000 is begin to save! 0: [2023-04-28 06:40:45,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_01-model_00-model_states.pt... 0: [2023-04-28 06:40:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_01-model_00-model_states.pt. 0: [2023-04-28 06:40:46,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_03-model_00-model_states.pt... 0: [2023-04-28 06:40:46,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_03-model_00-model_states.pt. 0: [2023-04-28 06:40:46,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_04-model_00-model_states.pt... 0: [2023-04-28 06:40:46,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_04-model_00-model_states.pt. 0: [2023-04-28 06:40:46,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_05-model_00-model_states.pt... 0: [2023-04-28 06:40:46,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_05-model_00-model_states.pt. 0: [2023-04-28 06:40:46,376] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_06-model_00-model_states.pt... 0: [2023-04-28 06:40:46,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_06-model_00-model_states.pt. 0: [2023-04-28 06:40:46,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_07-model_00-model_states.pt... 0: [2023-04-28 06:40:46,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_07-model_00-model_states.pt. 0: [2023-04-28 06:40:46,547] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_08-model_00-model_states.pt... 0: [2023-04-28 06:40:46,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_08-model_00-model_states.pt. 0: [2023-04-28 06:40:46,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_09-model_00-model_states.pt... 0: [2023-04-28 06:40:46,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_09-model_00-model_states.pt. 0: [2023-04-28 06:40:46,713] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_10-model_00-model_states.pt... 0: [2023-04-28 06:40:46,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_10-model_00-model_states.pt. 0: [2023-04-28 06:40:46,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_11-model_00-model_states.pt... 0: [2023-04-28 06:40:46,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_11-model_00-model_states.pt. 0: [2023-04-28 06:40:46,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_12-model_00-model_states.pt... 0: [2023-04-28 06:40:46,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_12-model_00-model_states.pt. 0: [2023-04-28 06:40:46,972] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_13-model_00-model_states.pt... 0: [2023-04-28 06:40:47,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_13-model_00-model_states.pt. 0: [2023-04-28 06:40:47,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_14-model_00-model_states.pt... 0: [2023-04-28 06:40:47,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_14-model_00-model_states.pt. 0: [2023-04-28 06:40:47,139] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_15-model_00-model_states.pt... 0: [2023-04-28 06:40:47,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_15-model_00-model_states.pt. 0: [2023-04-28 06:40:47,215] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_16-model_00-model_states.pt... 0: [2023-04-28 06:40:47,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_16-model_00-model_states.pt. 0: [2023-04-28 06:40:47,307] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_17-model_00-model_states.pt... 0: [2023-04-28 06:40:47,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_17-model_00-model_states.pt. 0: [2023-04-28 06:40:47,398] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_18-model_00-model_states.pt... 0: [2023-04-28 06:40:47,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_18-model_00-model_states.pt. 0: [2023-04-28 06:40:47,487] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_19-model_00-model_states.pt... 0: [2023-04-28 06:40:47,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_19-model_00-model_states.pt. 0: [2023-04-28 06:40:47,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_20-model_00-model_states.pt... 0: [2023-04-28 06:40:47,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_20-model_00-model_states.pt. 0: [2023-04-28 06:40:47,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_21-model_00-model_states.pt... 0: [2023-04-28 06:40:47,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_21-model_00-model_states.pt. 0: [2023-04-28 06:40:47,725] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_22-model_00-model_states.pt... 0: [2023-04-28 06:40:47,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_22-model_00-model_states.pt. 0: [2023-04-28 06:40:47,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_23-model_00-model_states.pt... 0: [2023-04-28 06:40:47,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_23-model_00-model_states.pt. 0: [2023-04-28 06:40:47,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_24-model_00-model_states.pt... 0: [2023-04-28 06:40:47,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_24-model_00-model_states.pt. 0: [2023-04-28 06:40:47,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_25-model_00-model_states.pt... 0: [2023-04-28 06:40:48,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_25-model_00-model_states.pt. 0: [2023-04-28 06:40:48,059] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_26-model_00-model_states.pt... 0: [2023-04-28 06:40:48,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_26-model_00-model_states.pt. 0: [2023-04-28 06:40:48,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_27-model_00-model_states.pt... 0: [2023-04-28 06:40:48,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_27-model_00-model_states.pt. 0: [2023-04-28 06:40:48,224] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_28-model_00-model_states.pt... 0: [2023-04-28 06:40:48,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_28-model_00-model_states.pt. 0: [2023-04-28 06:40:48,311] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/layer_30-model_00-model_states.pt... 0: [2023-04-28 06:40:48,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/layer_30-model_00-model_states.pt. 0: [2023-04-28 06:40:48,314] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step300000/mp_rank_00_model_states.pt 0: [2023-04-28 06:40:48,314] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/mp_rank_00_model_states.pt... 0: [2023-04-28 06:40:48,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/mp_rank_00_model_states.pt. 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 17: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 23: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 15: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 12: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 24: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 21: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 29: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 28: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 26: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 16: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 0: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 10: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 27: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 13: [2023-04-28 06:40:48,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 11: [2023-04-28 06:40:48,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-28 06:40:48,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 0: [2023-04-28 06:40:48,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-28 06:40:48,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-28 06:40:48,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-28 06:40:48,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 0: [2023-04-28 06:40:48,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 0: [2023-04-28 06:40:48,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-28 06:40:48,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-28 06:40:48,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-28 06:40:48,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-28 06:40:48,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-28 06:40:48,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 16: [2023-04-28 06:40:48,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 16: [2023-04-28 06:40:48,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-28 06:40:48,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 11: [2023-04-28 06:40:48,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 14: [2023-04-28 06:40:48,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 11: [2023-04-28 06:40:48,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-28 06:40:48,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-28 06:40:48,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-28 06:40:48,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 14: [2023-04-28 06:40:48,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-28 06:40:48,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-28 06:40:48,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 5: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 8: [2023-04-28 06:40:48,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-28 06:40:48,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-28 06:40:48,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-28 06:40:48,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-28 06:40:48,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 1: [2023-04-28 06:40:48,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 27: [2023-04-28 06:40:48,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 7: [2023-04-28 06:40:48,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-28 06:40:48,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-28 06:40:48,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 22: [2023-04-28 06:40:48,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-28 06:40:48,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-28 06:40:48,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 13: [2023-04-28 06:40:48,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-28 06:40:48,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 15: [2023-04-28 06:40:48,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-28 06:40:48,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-28 06:40:48,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 20: [2023-04-28 06:40:48,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 17: [2023-04-28 06:40:48,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-28 06:40:48,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-28 06:40:48,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 3: [2023-04-28 06:40:48,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 29: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 25: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-28 06:40:48,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 2: [2023-04-28 06:40:48,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 24: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-28 06:40:48,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 19: [2023-04-28 06:40:48,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 26: [2023-04-28 06:40:48,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-28 06:40:48,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-28 06:40:48,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-28 06:40:48,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 30: [2023-04-28 06:40:48,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 21: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 12: [2023-04-28 06:40:48,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 18: [2023-04-28 06:40:48,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 9: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,652] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 4: [2023-04-28 06:40:48,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 23: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 10: [2023-04-28 06:40:48,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-28 06:40:48,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-28 06:40:48,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 28: [2023-04-28 06:40:48,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-28 06:40:48,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-28 06:40:48,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 31: [2023-04-28 06:40:48,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step300000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 6: [2023-04-28 06:40:48,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step300000 is ready now! 0: successfully saved checkpoint at iteration 300000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 2940.06 31: iteration 300100/ 476837 | consumed samples: 76825600 | consumed tokens: 157338828800 | elapsed time per iteration (s): 0.71 | learning rate: 7.540E-05 | global batch size: 256 | lm loss: 2.482931E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.310 | TFLOPs: 21.74 | 31: iteration 300200/ 476837 | consumed samples: 76851200 | consumed tokens: 157391257600 | elapsed time per iteration (s): 0.68 | learning rate: 7.534E-05 | global batch size: 256 | lm loss: 2.487218E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.610 | TFLOPs: 22.66 | 31: iteration 300300/ 476837 | consumed samples: 76876800 | consumed tokens: 157443686400 | elapsed time per iteration (s): 0.68 | learning rate: 7.529E-05 | global batch size: 256 | lm loss: 2.479150E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.731 | TFLOPs: 22.79 | 31: iteration 300400/ 476837 | consumed samples: 76902400 | consumed tokens: 157496115200 | elapsed time per iteration (s): 0.68 | learning rate: 7.523E-05 | global batch size: 256 | lm loss: 2.480589E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.751 | TFLOPs: 22.79 | 31: iteration 300500/ 476837 | consumed samples: 76928000 | consumed tokens: 157548544000 | elapsed time per iteration (s): 0.68 | learning rate: 7.518E-05 | global batch size: 256 | lm loss: 2.476098E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.346 | TFLOPs: 22.71 | 31: iteration 300600/ 476837 | consumed samples: 76953600 | consumed tokens: 157600972800 | elapsed time per iteration (s): 0.68 | learning rate: 7.512E-05 | global batch size: 256 | lm loss: 2.475784E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.228 | TFLOPs: 22.70 | 31: iteration 300700/ 476837 | consumed samples: 76979200 | consumed tokens: 157653401600 | elapsed time per iteration (s): 0.68 | learning rate: 7.507E-05 | global batch size: 256 | lm loss: 2.483318E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.288 | TFLOPs: 22.76 | 31: iteration 300800/ 476837 | consumed samples: 77004800 | consumed tokens: 157705830400 | elapsed time per iteration (s): 0.68 | learning rate: 7.501E-05 | global batch size: 256 | lm loss: 2.482365E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.694 | TFLOPs: 22.79 | 31: iteration 300900/ 476837 | consumed samples: 77030400 | consumed tokens: 157758259200 | elapsed time per iteration (s): 0.68 | learning rate: 7.496E-05 | global batch size: 256 | lm loss: 2.484400E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.579 | TFLOPs: 22.78 | 31: iteration 301000/ 476837 | consumed samples: 77056000 | consumed tokens: 157810688000 | elapsed time per iteration (s): 0.68 | learning rate: 7.490E-05 | global batch size: 256 | lm loss: 2.475349E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.656 | TFLOPs: 22.79 | 31: iteration 301100/ 476837 | consumed samples: 77081600 | consumed tokens: 157863116800 | elapsed time per iteration (s): 0.68 | learning rate: 7.485E-05 | global batch size: 256 | lm loss: 2.481406E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.714 | TFLOPs: 22.79 | 31: iteration 301200/ 476837 | consumed samples: 77107200 | consumed tokens: 157915545600 | elapsed time per iteration (s): 0.68 | learning rate: 7.479E-05 | global batch size: 256 | lm loss: 2.478991E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.645 | TFLOPs: 22.79 | 31: iteration 301300/ 476837 | consumed samples: 77132800 | consumed tokens: 157967974400 | elapsed time per iteration (s): 0.68 | learning rate: 7.474E-05 | global batch size: 256 | lm loss: 2.480914E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.714 | TFLOPs: 22.79 | 31: iteration 301400/ 476837 | consumed samples: 77158400 | consumed tokens: 158020403200 | elapsed time per iteration (s): 0.69 | learning rate: 7.468E-05 | global batch size: 256 | lm loss: 2.482511E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.610 | TFLOPs: 22.54 | 31: iteration 301500/ 476837 | consumed samples: 77184000 | consumed tokens: 158072832000 | elapsed time per iteration (s): 0.68 | learning rate: 7.463E-05 | global batch size: 256 | lm loss: 2.480193E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.795 | TFLOPs: 22.73 | 31: iteration 301600/ 476837 | consumed samples: 77209600 | consumed tokens: 158125260800 | elapsed time per iteration (s): 0.68 | learning rate: 7.457E-05 | global batch size: 256 | lm loss: 2.478673E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.814 | TFLOPs: 22.74 | 31: iteration 301700/ 476837 | consumed samples: 77235200 | consumed tokens: 158177689600 | elapsed time per iteration (s): 0.69 | learning rate: 7.452E-05 | global batch size: 256 | lm loss: 2.475965E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.380 | TFLOPs: 22.59 | 31: iteration 301800/ 476837 | consumed samples: 77260800 | consumed tokens: 158230118400 | elapsed time per iteration (s): 0.71 | learning rate: 7.446E-05 | global batch size: 256 | lm loss: 2.478327E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.214 | TFLOPs: 21.73 | 31: iteration 301900/ 476837 | consumed samples: 77286400 | consumed tokens: 158282547200 | elapsed time per iteration (s): 0.86 | learning rate: 7.441E-05 | global batch size: 256 | lm loss: 2.478531E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 297.373 | TFLOPs: 17.99 | 0: [2023-04-28 07:03:52,306] [INFO] [logging.py:68:log_dist] [Rank 0] step=302000, skipped=0, lr=[7.4351098536927e-05, 7.4351098536927e-05, 7.4351098536927e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 302000/ 476837 | consumed samples: 77312000 | consumed tokens: 158334976000 | elapsed time per iteration (s): 0.68 | learning rate: 7.435E-05 | global batch size: 256 | lm loss: 2.481281E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.206 | TFLOPs: 22.70 | 0: steps: 302000 loss: 2.4216 iter time (s): 0.689 samples/sec: 371.363 31: iteration 302100/ 476837 | consumed samples: 77337600 | consumed tokens: 158387404800 | elapsed time per iteration (s): 0.68 | learning rate: 7.430E-05 | global batch size: 256 | lm loss: 2.479495E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.988 | TFLOPs: 22.75 | 31: iteration 302200/ 476837 | consumed samples: 77363200 | consumed tokens: 158439833600 | elapsed time per iteration (s): 0.69 | learning rate: 7.424E-05 | global batch size: 256 | lm loss: 2.473925E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.407 | TFLOPs: 22.59 | 31: iteration 302300/ 476837 | consumed samples: 77388800 | consumed tokens: 158492262400 | elapsed time per iteration (s): 0.68 | learning rate: 7.419E-05 | global batch size: 256 | lm loss: 2.480387E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.547 | TFLOPs: 22.78 | 31: iteration 302400/ 476837 | consumed samples: 77414400 | consumed tokens: 158544691200 | elapsed time per iteration (s): 0.68 | learning rate: 7.413E-05 | global batch size: 256 | lm loss: 2.481527E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.477 | TFLOPs: 22.78 | 31: iteration 302500/ 476837 | consumed samples: 77440000 | consumed tokens: 158597120000 | elapsed time per iteration (s): 0.68 | learning rate: 7.408E-05 | global batch size: 256 | lm loss: 2.476620E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.363 | TFLOPs: 22.77 | 31: iteration 302600/ 476837 | consumed samples: 77465600 | consumed tokens: 158649548800 | elapsed time per iteration (s): 0.68 | learning rate: 7.402E-05 | global batch size: 256 | lm loss: 2.481352E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.766 | TFLOPs: 22.79 | 31: iteration 302700/ 476837 | consumed samples: 77491200 | consumed tokens: 158701977600 | elapsed time per iteration (s): 0.68 | learning rate: 7.397E-05 | global batch size: 256 | lm loss: 2.478209E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.850 | TFLOPs: 22.74 | 31: iteration 302800/ 476837 | consumed samples: 77516800 | consumed tokens: 158754406400 | elapsed time per iteration (s): 0.68 | learning rate: 7.391E-05 | global batch size: 256 | lm loss: 2.478222E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.307 | TFLOPs: 22.77 | 31: iteration 302900/ 476837 | consumed samples: 77542400 | consumed tokens: 158806835200 | elapsed time per iteration (s): 0.68 | learning rate: 7.386E-05 | global batch size: 256 | lm loss: 2.478406E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.739 | TFLOPs: 22.79 | 31: iteration 303000/ 476837 | consumed samples: 77568000 | consumed tokens: 158859264000 | elapsed time per iteration (s): 0.68 | learning rate: 7.380E-05 | global batch size: 256 | lm loss: 2.478285E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.790 | TFLOPs: 22.79 | 31: iteration 303100/ 476837 | consumed samples: 77593600 | consumed tokens: 158911692800 | elapsed time per iteration (s): 0.68 | learning rate: 7.375E-05 | global batch size: 256 | lm loss: 2.480208E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.107 | TFLOPs: 22.75 | 31: iteration 303200/ 476837 | consumed samples: 77619200 | consumed tokens: 158964121600 | elapsed time per iteration (s): 0.68 | learning rate: 7.369E-05 | global batch size: 256 | lm loss: 2.479131E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.014 | TFLOPs: 22.75 | 31: iteration 303300/ 476837 | consumed samples: 77644800 | consumed tokens: 159016550400 | elapsed time per iteration (s): 0.68 | learning rate: 7.364E-05 | global batch size: 256 | lm loss: 2.478809E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.651 | TFLOPs: 22.79 | 31: iteration 303400/ 476837 | consumed samples: 77670400 | consumed tokens: 159068979200 | elapsed time per iteration (s): 0.68 | learning rate: 7.358E-05 | global batch size: 256 | lm loss: 2.481143E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.797 | TFLOPs: 22.80 | 31: iteration 303500/ 476837 | consumed samples: 77696000 | consumed tokens: 159121408000 | elapsed time per iteration (s): 0.68 | learning rate: 7.353E-05 | global batch size: 256 | lm loss: 2.476937E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.815 | TFLOPs: 22.80 | 31: iteration 303600/ 476837 | consumed samples: 77721600 | consumed tokens: 159173836800 | elapsed time per iteration (s): 0.68 | learning rate: 7.347E-05 | global batch size: 256 | lm loss: 2.480046E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.793 | TFLOPs: 22.79 | 31: iteration 303700/ 476837 | consumed samples: 77747200 | consumed tokens: 159226265600 | elapsed time per iteration (s): 0.68 | learning rate: 7.342E-05 | global batch size: 256 | lm loss: 2.480977E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.536 | TFLOPs: 22.78 | 31: iteration 303800/ 476837 | consumed samples: 77772800 | consumed tokens: 159278694400 | elapsed time per iteration (s): 0.68 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 2.479900E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.816 | TFLOPs: 22.74 | 31: iteration 303900/ 476837 | consumed samples: 77798400 | consumed tokens: 159331123200 | elapsed time per iteration (s): 0.68 | learning rate: 7.331E-05 | global batch size: 256 | lm loss: 2.481824E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.464 | TFLOPs: 22.78 | 0: [2023-04-28 07:26:33,294] [INFO] [logging.py:68:log_dist] [Rank 0] step=304000, skipped=0, lr=[7.325437624166644e-05, 7.325437624166644e-05, 7.325437624166644e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 304000/ 476837 | consumed samples: 77824000 | consumed tokens: 159383552000 | elapsed time per iteration (s): 0.68 | learning rate: 7.325E-05 | global batch size: 256 | lm loss: 2.479666E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.738 | TFLOPs: 22.67 | 0: steps: 304000 loss: 2.5084 iter time (s): 0.678 samples/sec: 377.574 31: iteration 304100/ 476837 | consumed samples: 77849600 | consumed tokens: 159435980800 | elapsed time per iteration (s): 0.68 | learning rate: 7.320E-05 | global batch size: 256 | lm loss: 2.477218E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.242 | TFLOPs: 22.64 | 31: iteration 304200/ 476837 | consumed samples: 77875200 | consumed tokens: 159488409600 | elapsed time per iteration (s): 0.69 | learning rate: 7.315E-05 | global batch size: 256 | lm loss: 2.478792E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.031 | TFLOPs: 22.57 | 31: iteration 304300/ 476837 | consumed samples: 77900800 | consumed tokens: 159540838400 | elapsed time per iteration (s): 0.68 | learning rate: 7.309E-05 | global batch size: 256 | lm loss: 2.477187E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.267 | TFLOPs: 22.70 | 31: iteration 304400/ 476837 | consumed samples: 77926400 | consumed tokens: 159593267200 | elapsed time per iteration (s): 0.68 | learning rate: 7.304E-05 | global batch size: 256 | lm loss: 2.482360E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.420 | TFLOPs: 22.71 | 31: iteration 304500/ 476837 | consumed samples: 77952000 | consumed tokens: 159645696000 | elapsed time per iteration (s): 0.68 | learning rate: 7.298E-05 | global batch size: 256 | lm loss: 2.474248E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.936 | TFLOPs: 22.74 | 31: iteration 304600/ 476837 | consumed samples: 77977600 | consumed tokens: 159698124800 | elapsed time per iteration (s): 0.68 | learning rate: 7.293E-05 | global batch size: 256 | lm loss: 2.480443E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.858 | TFLOPs: 22.62 | 31: iteration 304700/ 476837 | consumed samples: 78003200 | consumed tokens: 159750553600 | elapsed time per iteration (s): 0.68 | learning rate: 7.287E-05 | global batch size: 256 | lm loss: 2.481985E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.952 | TFLOPs: 22.74 | 31: iteration 304800/ 476837 | consumed samples: 78028800 | consumed tokens: 159802982400 | elapsed time per iteration (s): 0.81 | learning rate: 7.282E-05 | global batch size: 256 | lm loss: 2.481866E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 315.325 | TFLOPs: 19.08 | 31: iteration 304900/ 476837 | consumed samples: 78054400 | consumed tokens: 159855411200 | elapsed time per iteration (s): 0.76 | learning rate: 7.276E-05 | global batch size: 256 | lm loss: 2.476416E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 335.125 | TFLOPs: 20.27 | 31: iteration 305000/ 476837 | consumed samples: 78080000 | consumed tokens: 159907840000 | elapsed time per iteration (s): 0.68 | learning rate: 7.271E-05 | global batch size: 256 | lm loss: 2.473175E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.394 | TFLOPs: 22.77 | 31: iteration 305100/ 476837 | consumed samples: 78105600 | consumed tokens: 159960268800 | elapsed time per iteration (s): 0.68 | learning rate: 7.265E-05 | global batch size: 256 | lm loss: 2.479334E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.171 | TFLOPs: 22.76 | 31: iteration 305200/ 476837 | consumed samples: 78131200 | consumed tokens: 160012697600 | elapsed time per iteration (s): 0.68 | learning rate: 7.260E-05 | global batch size: 256 | lm loss: 2.473748E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.302 | TFLOPs: 22.77 | 31: iteration 305300/ 476837 | consumed samples: 78156800 | consumed tokens: 160065126400 | elapsed time per iteration (s): 0.68 | learning rate: 7.254E-05 | global batch size: 256 | lm loss: 2.473728E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.521 | TFLOPs: 22.78 | 31: iteration 305400/ 476837 | consumed samples: 78182400 | consumed tokens: 160117555200 | elapsed time per iteration (s): 0.68 | learning rate: 7.249E-05 | global batch size: 256 | lm loss: 2.476668E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.593 | TFLOPs: 22.78 | 31: iteration 305500/ 476837 | consumed samples: 78208000 | consumed tokens: 160169984000 | elapsed time per iteration (s): 0.68 | learning rate: 7.244E-05 | global batch size: 256 | lm loss: 2.474628E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.709 | TFLOPs: 22.79 | 31: iteration 305600/ 476837 | consumed samples: 78233600 | consumed tokens: 160222412800 | elapsed time per iteration (s): 0.68 | learning rate: 7.238E-05 | global batch size: 256 | lm loss: 2.472971E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.430 | TFLOPs: 22.77 | 31: iteration 305700/ 476837 | consumed samples: 78259200 | consumed tokens: 160274841600 | elapsed time per iteration (s): 0.68 | learning rate: 7.233E-05 | global batch size: 256 | lm loss: 2.475954E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.970 | TFLOPs: 22.75 | 31: iteration 305800/ 476837 | consumed samples: 78284800 | consumed tokens: 160327270400 | elapsed time per iteration (s): 0.68 | learning rate: 7.227E-05 | global batch size: 256 | lm loss: 2.473656E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.027 | TFLOPs: 22.75 | 31: iteration 305900/ 476837 | consumed samples: 78310400 | consumed tokens: 160379699200 | elapsed time per iteration (s): 0.68 | learning rate: 7.222E-05 | global batch size: 256 | lm loss: 2.477710E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.649 | TFLOPs: 22.67 | 0: [2023-04-28 07:49:37,484] [INFO] [logging.py:68:log_dist] [Rank 0] step=306000, skipped=0, lr=[7.216416345752938e-05, 7.216416345752938e-05, 7.216416345752938e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 306000/ 476837 | consumed samples: 78336000 | consumed tokens: 160432128000 | elapsed time per iteration (s): 0.68 | learning rate: 7.216E-05 | global batch size: 256 | lm loss: 2.475769E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.615 | TFLOPs: 22.78 | 0: steps: 306000 loss: 2.4383 iter time (s): 0.689 samples/sec: 371.313 31: iteration 306100/ 476837 | consumed samples: 78361600 | consumed tokens: 160484556800 | elapsed time per iteration (s): 0.68 | learning rate: 7.211E-05 | global batch size: 256 | lm loss: 2.475861E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.650 | TFLOPs: 22.79 | 31: iteration 306200/ 476837 | consumed samples: 78387200 | consumed tokens: 160536985600 | elapsed time per iteration (s): 0.68 | learning rate: 7.206E-05 | global batch size: 256 | lm loss: 2.475569E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.523 | TFLOPs: 22.66 | 31: iteration 306300/ 476837 | consumed samples: 78412800 | consumed tokens: 160589414400 | elapsed time per iteration (s): 0.68 | learning rate: 7.200E-05 | global batch size: 256 | lm loss: 2.476615E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.362 | TFLOPs: 22.77 | 31: iteration 306400/ 476837 | consumed samples: 78438400 | consumed tokens: 160641843200 | elapsed time per iteration (s): 0.68 | learning rate: 7.195E-05 | global batch size: 256 | lm loss: 2.473663E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.637 | TFLOPs: 22.79 | 31: iteration 306500/ 476837 | consumed samples: 78464000 | consumed tokens: 160694272000 | elapsed time per iteration (s): 0.68 | learning rate: 7.189E-05 | global batch size: 256 | lm loss: 2.470690E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.218 | TFLOPs: 22.76 | 31: iteration 306600/ 476837 | consumed samples: 78489600 | consumed tokens: 160746700800 | elapsed time per iteration (s): 0.68 | learning rate: 7.184E-05 | global batch size: 256 | lm loss: 2.477911E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.555 | TFLOPs: 22.78 | 31: iteration 306700/ 476837 | consumed samples: 78515200 | consumed tokens: 160799129600 | elapsed time per iteration (s): 0.68 | learning rate: 7.178E-05 | global batch size: 256 | lm loss: 2.475850E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.648 | TFLOPs: 22.79 | 31: iteration 306800/ 476837 | consumed samples: 78540800 | consumed tokens: 160851558400 | elapsed time per iteration (s): 0.68 | learning rate: 7.173E-05 | global batch size: 256 | lm loss: 2.478245E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.393 | TFLOPs: 22.77 | 31: iteration 306900/ 476837 | consumed samples: 78566400 | consumed tokens: 160903987200 | elapsed time per iteration (s): 0.68 | learning rate: 7.168E-05 | global batch size: 256 | lm loss: 2.472595E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.033 | TFLOPs: 22.75 | 31: iteration 307000/ 476837 | consumed samples: 78592000 | consumed tokens: 160956416000 | elapsed time per iteration (s): 0.68 | learning rate: 7.162E-05 | global batch size: 256 | lm loss: 2.477743E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.535 | TFLOPs: 22.78 | 31: iteration 307100/ 476837 | consumed samples: 78617600 | consumed tokens: 161008844800 | elapsed time per iteration (s): 0.69 | learning rate: 7.157E-05 | global batch size: 256 | lm loss: 2.476390E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.494 | TFLOPs: 22.60 | 31: iteration 307200/ 476837 | consumed samples: 78643200 | consumed tokens: 161061273600 | elapsed time per iteration (s): 0.68 | learning rate: 7.151E-05 | global batch size: 256 | lm loss: 2.475925E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.875 | TFLOPs: 22.74 | 31: iteration 307300/ 476837 | consumed samples: 78668800 | consumed tokens: 161113702400 | elapsed time per iteration (s): 0.68 | learning rate: 7.146E-05 | global batch size: 256 | lm loss: 2.475202E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.931 | TFLOPs: 22.74 | 31: iteration 307400/ 476837 | consumed samples: 78694400 | consumed tokens: 161166131200 | elapsed time per iteration (s): 0.69 | learning rate: 7.140E-05 | global batch size: 256 | lm loss: 2.473707E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.710 | TFLOPs: 22.61 | 31: iteration 307500/ 476837 | consumed samples: 78720000 | consumed tokens: 161218560000 | elapsed time per iteration (s): 0.68 | learning rate: 7.135E-05 | global batch size: 256 | lm loss: 2.472857E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.828 | TFLOPs: 22.74 | 31: iteration 307600/ 476837 | consumed samples: 78745600 | consumed tokens: 161270988800 | elapsed time per iteration (s): 0.68 | learning rate: 7.130E-05 | global batch size: 256 | lm loss: 2.476807E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.275 | TFLOPs: 22.76 | 31: iteration 307700/ 476837 | consumed samples: 78771200 | consumed tokens: 161323417600 | elapsed time per iteration (s): 0.68 | learning rate: 7.124E-05 | global batch size: 256 | lm loss: 2.476367E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.501 | TFLOPs: 22.78 | 31: iteration 307800/ 476837 | consumed samples: 78796800 | consumed tokens: 161375846400 | elapsed time per iteration (s): 0.81 | learning rate: 7.119E-05 | global batch size: 256 | lm loss: 2.475811E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 316.612 | TFLOPs: 19.15 | 31: iteration 307900/ 476837 | consumed samples: 78822400 | consumed tokens: 161428275200 | elapsed time per iteration (s): 0.77 | learning rate: 7.113E-05 | global batch size: 256 | lm loss: 2.474127E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 331.640 | TFLOPs: 20.06 | 0: [2023-04-28 08:12:41,262] [INFO] [logging.py:68:log_dist] [Rank 0] step=308000, skipped=0, lr=[7.108065331641882e-05, 7.108065331641882e-05, 7.108065331641882e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 308000/ 476837 | consumed samples: 78848000 | consumed tokens: 161480704000 | elapsed time per iteration (s): 0.68 | learning rate: 7.108E-05 | global batch size: 256 | lm loss: 2.471904E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.794 | TFLOPs: 22.80 | 0: steps: 308000 loss: 2.5185 iter time (s): 0.689 samples/sec: 371.773 31: iteration 308100/ 476837 | consumed samples: 78873600 | consumed tokens: 161533132800 | elapsed time per iteration (s): 0.68 | learning rate: 7.103E-05 | global batch size: 256 | lm loss: 2.474435E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.482 | TFLOPs: 22.78 | 31: iteration 308200/ 476837 | consumed samples: 78899200 | consumed tokens: 161585561600 | elapsed time per iteration (s): 0.68 | learning rate: 7.097E-05 | global batch size: 256 | lm loss: 2.475256E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.536 | TFLOPs: 22.78 | 31: iteration 308300/ 476837 | consumed samples: 78924800 | consumed tokens: 161637990400 | elapsed time per iteration (s): 0.68 | learning rate: 7.092E-05 | global batch size: 256 | lm loss: 2.472639E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.885 | TFLOPs: 22.80 | 31: iteration 308400/ 476837 | consumed samples: 78950400 | consumed tokens: 161690419200 | elapsed time per iteration (s): 0.68 | learning rate: 7.086E-05 | global batch size: 256 | lm loss: 2.471761E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.177 | TFLOPs: 22.76 | 31: iteration 308500/ 476837 | consumed samples: 78976000 | consumed tokens: 161742848000 | elapsed time per iteration (s): 0.68 | learning rate: 7.081E-05 | global batch size: 256 | lm loss: 2.475261E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.845 | TFLOPs: 22.80 | 31: iteration 308600/ 476837 | consumed samples: 79001600 | consumed tokens: 161795276800 | elapsed time per iteration (s): 0.68 | learning rate: 7.076E-05 | global batch size: 256 | lm loss: 2.470114E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.818 | TFLOPs: 22.80 | 31: iteration 308700/ 476837 | consumed samples: 79027200 | consumed tokens: 161847705600 | elapsed time per iteration (s): 0.68 | learning rate: 7.070E-05 | global batch size: 256 | lm loss: 2.476279E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.720 | TFLOPs: 22.79 | 31: iteration 308800/ 476837 | consumed samples: 79052800 | consumed tokens: 161900134400 | elapsed time per iteration (s): 0.68 | learning rate: 7.065E-05 | global batch size: 256 | lm loss: 2.471488E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.697 | TFLOPs: 22.79 | 31: iteration 308900/ 476837 | consumed samples: 79078400 | consumed tokens: 161952563200 | elapsed time per iteration (s): 0.68 | learning rate: 7.060E-05 | global batch size: 256 | lm loss: 2.474921E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.881 | TFLOPs: 22.74 | 31: iteration 309000/ 476837 | consumed samples: 79104000 | consumed tokens: 162004992000 | elapsed time per iteration (s): 0.68 | learning rate: 7.054E-05 | global batch size: 256 | lm loss: 2.471284E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.697 | TFLOPs: 22.79 | 31: iteration 309100/ 476837 | consumed samples: 79129600 | consumed tokens: 162057420800 | elapsed time per iteration (s): 0.68 | learning rate: 7.049E-05 | global batch size: 256 | lm loss: 2.471058E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.649 | TFLOPs: 22.79 | 31: iteration 309200/ 476837 | consumed samples: 79155200 | consumed tokens: 162109849600 | elapsed time per iteration (s): 0.68 | learning rate: 7.043E-05 | global batch size: 256 | lm loss: 2.468117E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.757 | TFLOPs: 22.79 | 31: iteration 309300/ 476837 | consumed samples: 79180800 | consumed tokens: 162162278400 | elapsed time per iteration (s): 0.68 | learning rate: 7.038E-05 | global batch size: 256 | lm loss: 2.475974E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.783 | TFLOPs: 22.79 | 31: iteration 309400/ 476837 | consumed samples: 79206400 | consumed tokens: 162214707200 | elapsed time per iteration (s): 0.68 | learning rate: 7.033E-05 | global batch size: 256 | lm loss: 2.473963E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.969 | TFLOPs: 22.75 | 31: iteration 309500/ 476837 | consumed samples: 79232000 | consumed tokens: 162267136000 | elapsed time per iteration (s): 0.68 | learning rate: 7.027E-05 | global batch size: 256 | lm loss: 2.473272E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.098 | TFLOPs: 22.75 | 31: iteration 309600/ 476837 | consumed samples: 79257600 | consumed tokens: 162319564800 | elapsed time per iteration (s): 0.68 | learning rate: 7.022E-05 | global batch size: 256 | lm loss: 2.471701E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.276 | TFLOPs: 22.76 | 31: iteration 309700/ 476837 | consumed samples: 79283200 | consumed tokens: 162371993600 | elapsed time per iteration (s): 0.68 | learning rate: 7.017E-05 | global batch size: 256 | lm loss: 2.471303E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.774 | TFLOPs: 22.67 | 31: iteration 309800/ 476837 | consumed samples: 79308800 | consumed tokens: 162424422400 | elapsed time per iteration (s): 0.68 | learning rate: 7.011E-05 | global batch size: 256 | lm loss: 2.470880E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.170 | TFLOPs: 22.64 | 31: iteration 309900/ 476837 | consumed samples: 79334400 | consumed tokens: 162476851200 | elapsed time per iteration (s): 0.68 | learning rate: 7.006E-05 | global batch size: 256 | lm loss: 2.471543E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.727 | TFLOPs: 22.73 | 0: [2023-04-28 08:35:22,314] [INFO] [logging.py:68:log_dist] [Rank 0] step=310000, skipped=0, lr=[7.000403776286021e-05, 7.000403776286021e-05, 7.000403776286021e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 310000/ 476837 | consumed samples: 79360000 | consumed tokens: 162529280000 | elapsed time per iteration (s): 0.68 | learning rate: 7.000E-05 | global batch size: 256 | lm loss: 2.475863E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.682 | TFLOPs: 22.67 | 0: steps: 310000 loss: 2.4535 iter time (s): 0.677 samples/sec: 377.906 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 310000 | lm loss value: 2.942449E+00 | lm loss PPL: 1.896223E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 310100/ 476837 | consumed samples: 79385600 | consumed tokens: 162581708800 | elapsed time per iteration (s): 0.68 | learning rate: 6.995E-05 | global batch size: 256 | lm loss: 2.477169E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.263 | TFLOPs: 22.70 | 31: iteration 310200/ 476837 | consumed samples: 79411200 | consumed tokens: 162634137600 | elapsed time per iteration (s): 0.68 | learning rate: 6.990E-05 | global batch size: 256 | lm loss: 2.471923E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.095 | TFLOPs: 22.75 | 31: iteration 310300/ 476837 | consumed samples: 79436800 | consumed tokens: 162686566400 | elapsed time per iteration (s): 0.68 | learning rate: 6.984E-05 | global batch size: 256 | lm loss: 2.473911E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.889 | TFLOPs: 22.62 | 31: iteration 310400/ 476837 | consumed samples: 79462400 | consumed tokens: 162738995200 | elapsed time per iteration (s): 0.68 | learning rate: 6.979E-05 | global batch size: 256 | lm loss: 2.468173E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.532 | TFLOPs: 22.78 | 31: iteration 310500/ 476837 | consumed samples: 79488000 | consumed tokens: 162791424000 | elapsed time per iteration (s): 0.68 | learning rate: 6.974E-05 | global batch size: 256 | lm loss: 2.472814E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.200 | TFLOPs: 22.64 | 31: iteration 310600/ 476837 | consumed samples: 79513600 | consumed tokens: 162843852800 | elapsed time per iteration (s): 0.68 | learning rate: 6.968E-05 | global batch size: 256 | lm loss: 2.474383E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.465 | TFLOPs: 22.78 | 31: iteration 310700/ 476837 | consumed samples: 79539200 | consumed tokens: 162896281600 | elapsed time per iteration (s): 0.68 | learning rate: 6.963E-05 | global batch size: 256 | lm loss: 2.469709E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.615 | TFLOPs: 22.72 | 31: iteration 310800/ 476837 | consumed samples: 79564800 | consumed tokens: 162948710400 | elapsed time per iteration (s): 0.75 | learning rate: 6.958E-05 | global batch size: 256 | lm loss: 2.470010E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 341.383 | TFLOPs: 20.65 | 31: iteration 310900/ 476837 | consumed samples: 79590400 | consumed tokens: 163001139200 | elapsed time per iteration (s): 0.85 | learning rate: 6.952E-05 | global batch size: 256 | lm loss: 2.474907E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 302.812 | TFLOPs: 18.32 | 31: iteration 311000/ 476837 | consumed samples: 79616000 | consumed tokens: 163053568000 | elapsed time per iteration (s): 0.68 | learning rate: 6.947E-05 | global batch size: 256 | lm loss: 2.475852E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.352 | TFLOPs: 22.77 | 31: iteration 311100/ 476837 | consumed samples: 79641600 | consumed tokens: 163105996800 | elapsed time per iteration (s): 0.68 | learning rate: 6.941E-05 | global batch size: 256 | lm loss: 2.472522E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.681 | TFLOPs: 22.79 | 31: iteration 311200/ 476837 | consumed samples: 79667200 | consumed tokens: 163158425600 | elapsed time per iteration (s): 0.68 | learning rate: 6.936E-05 | global batch size: 256 | lm loss: 2.470620E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.509 | TFLOPs: 22.78 | 31: iteration 311300/ 476837 | consumed samples: 79692800 | consumed tokens: 163210854400 | elapsed time per iteration (s): 0.68 | learning rate: 6.931E-05 | global batch size: 256 | lm loss: 2.469420E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.702 | TFLOPs: 22.79 | 31: iteration 311400/ 476837 | consumed samples: 79718400 | consumed tokens: 163263283200 | elapsed time per iteration (s): 0.68 | learning rate: 6.925E-05 | global batch size: 256 | lm loss: 2.471535E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.044 | TFLOPs: 22.69 | 31: iteration 311500/ 476837 | consumed samples: 79744000 | consumed tokens: 163315712000 | elapsed time per iteration (s): 0.68 | learning rate: 6.920E-05 | global batch size: 256 | lm loss: 2.470353E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.557 | TFLOPs: 22.72 | 31: iteration 311600/ 476837 | consumed samples: 79769600 | consumed tokens: 163368140800 | elapsed time per iteration (s): 0.68 | learning rate: 6.915E-05 | global batch size: 256 | lm loss: 2.469169E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.728 | TFLOPs: 22.73 | 31: iteration 311700/ 476837 | consumed samples: 79795200 | consumed tokens: 163420569600 | elapsed time per iteration (s): 0.68 | learning rate: 6.909E-05 | global batch size: 256 | lm loss: 2.470924E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.513 | TFLOPs: 22.78 | 31: iteration 311800/ 476837 | consumed samples: 79820800 | consumed tokens: 163472998400 | elapsed time per iteration (s): 0.68 | learning rate: 6.904E-05 | global batch size: 256 | lm loss: 2.468698E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.020 | TFLOPs: 22.75 | 31: iteration 311900/ 476837 | consumed samples: 79846400 | consumed tokens: 163525427200 | elapsed time per iteration (s): 0.68 | learning rate: 6.899E-05 | global batch size: 256 | lm loss: 2.469573E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.154 | TFLOPs: 22.70 | 0: [2023-04-28 08:58:28,039] [INFO] [logging.py:68:log_dist] [Rank 0] step=312000, skipped=0, lr=[6.893450751999844e-05, 6.893450751999844e-05, 6.893450751999844e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 312000/ 476837 | consumed samples: 79872000 | consumed tokens: 163577856000 | elapsed time per iteration (s): 0.68 | learning rate: 6.893E-05 | global batch size: 256 | lm loss: 2.470960E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.052 | TFLOPs: 22.75 | 0: steps: 312000 loss: 2.5006 iter time (s): 0.689 samples/sec: 371.357 31: iteration 312100/ 476837 | consumed samples: 79897600 | consumed tokens: 163630284800 | elapsed time per iteration (s): 0.68 | learning rate: 6.888E-05 | global batch size: 256 | lm loss: 2.472220E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.766 | TFLOPs: 22.79 | 31: iteration 312200/ 476837 | consumed samples: 79923200 | consumed tokens: 163682713600 | elapsed time per iteration (s): 0.68 | learning rate: 6.883E-05 | global batch size: 256 | lm loss: 2.465949E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.855 | TFLOPs: 22.80 | 31: iteration 312300/ 476837 | consumed samples: 79948800 | consumed tokens: 163735142400 | elapsed time per iteration (s): 0.68 | learning rate: 6.877E-05 | global batch size: 256 | lm loss: 2.475354E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.713 | TFLOPs: 22.79 | 31: iteration 312400/ 476837 | consumed samples: 79974400 | consumed tokens: 163787571200 | elapsed time per iteration (s): 0.68 | learning rate: 6.872E-05 | global batch size: 256 | lm loss: 2.472745E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.564 | TFLOPs: 22.78 | 31: iteration 312500/ 476837 | consumed samples: 80000000 | consumed tokens: 163840000000 | elapsed time per iteration (s): 0.68 | learning rate: 6.867E-05 | global batch size: 256 | lm loss: 2.468885E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.235 | TFLOPs: 22.76 | 31: iteration 312600/ 476837 | consumed samples: 80025600 | consumed tokens: 163892428800 | elapsed time per iteration (s): 0.68 | learning rate: 6.862E-05 | global batch size: 256 | lm loss: 2.467912E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.576 | TFLOPs: 22.78 | 31: iteration 312700/ 476837 | consumed samples: 80051200 | consumed tokens: 163944857600 | elapsed time per iteration (s): 0.68 | learning rate: 6.856E-05 | global batch size: 256 | lm loss: 2.465538E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.395 | TFLOPs: 22.71 | 31: iteration 312800/ 476837 | consumed samples: 80076800 | consumed tokens: 163997286400 | elapsed time per iteration (s): 0.68 | learning rate: 6.851E-05 | global batch size: 256 | lm loss: 2.468812E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.909 | TFLOPs: 22.68 | 31: iteration 312900/ 476837 | consumed samples: 80102400 | consumed tokens: 164049715200 | elapsed time per iteration (s): 0.68 | learning rate: 6.846E-05 | global batch size: 256 | lm loss: 2.467112E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.154 | TFLOPs: 22.76 | 31: iteration 313000/ 476837 | consumed samples: 80128000 | consumed tokens: 164102144000 | elapsed time per iteration (s): 0.68 | learning rate: 6.840E-05 | global batch size: 256 | lm loss: 2.474079E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.240 | TFLOPs: 22.70 | 31: iteration 313100/ 476837 | consumed samples: 80153600 | consumed tokens: 164154572800 | elapsed time per iteration (s): 0.68 | learning rate: 6.835E-05 | global batch size: 256 | lm loss: 2.472927E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.856 | TFLOPs: 22.62 | 31: iteration 313200/ 476837 | consumed samples: 80179200 | consumed tokens: 164207001600 | elapsed time per iteration (s): 0.68 | learning rate: 6.830E-05 | global batch size: 256 | lm loss: 2.468006E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.605 | TFLOPs: 22.72 | 31: iteration 313300/ 476837 | consumed samples: 80204800 | consumed tokens: 164259430400 | elapsed time per iteration (s): 0.68 | learning rate: 6.824E-05 | global batch size: 256 | lm loss: 2.466425E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.508 | TFLOPs: 22.78 | 31: iteration 313400/ 476837 | consumed samples: 80230400 | consumed tokens: 164311859200 | elapsed time per iteration (s): 0.68 | learning rate: 6.819E-05 | global batch size: 256 | lm loss: 2.468968E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.570 | TFLOPs: 22.78 | 31: iteration 313500/ 476837 | consumed samples: 80256000 | consumed tokens: 164364288000 | elapsed time per iteration (s): 0.68 | learning rate: 6.814E-05 | global batch size: 256 | lm loss: 2.470549E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.058 | TFLOPs: 22.63 | 31: iteration 313600/ 476837 | consumed samples: 80281600 | consumed tokens: 164416716800 | elapsed time per iteration (s): 0.68 | learning rate: 6.808E-05 | global batch size: 256 | lm loss: 2.470414E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.244 | TFLOPs: 22.76 | 31: iteration 313700/ 476837 | consumed samples: 80307200 | consumed tokens: 164469145600 | elapsed time per iteration (s): 0.68 | learning rate: 6.803E-05 | global batch size: 256 | lm loss: 2.470163E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.597 | TFLOPs: 22.78 | 31: iteration 313800/ 476837 | consumed samples: 80332800 | consumed tokens: 164521574400 | elapsed time per iteration (s): 0.68 | learning rate: 6.798E-05 | global batch size: 256 | lm loss: 2.471315E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.836 | TFLOPs: 22.68 | 31: iteration 313900/ 476837 | consumed samples: 80358400 | consumed tokens: 164574003200 | elapsed time per iteration (s): 0.85 | learning rate: 6.793E-05 | global batch size: 256 | lm loss: 2.470488E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 302.673 | TFLOPs: 18.31 | 0: [2023-04-28 09:21:32,823] [INFO] [logging.py:68:log_dist] [Rank 0] step=314000, skipped=0, lr=[6.787225205581096e-05, 6.787225205581096e-05, 6.787225205581096e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 314000/ 476837 | consumed samples: 80384000 | consumed tokens: 164626432000 | elapsed time per iteration (s): 0.74 | learning rate: 6.787E-05 | global batch size: 256 | lm loss: 2.469204E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 344.810 | TFLOPs: 20.86 | 0: steps: 314000 loss: 2.4379 iter time (s): 0.689 samples/sec: 371.513 31: iteration 314100/ 476837 | consumed samples: 80409600 | consumed tokens: 164678860800 | elapsed time per iteration (s): 0.68 | learning rate: 6.782E-05 | global batch size: 256 | lm loss: 2.465112E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.642 | TFLOPs: 22.79 | 31: iteration 314200/ 476837 | consumed samples: 80435200 | consumed tokens: 164731289600 | elapsed time per iteration (s): 0.68 | learning rate: 6.777E-05 | global batch size: 256 | lm loss: 2.469819E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.797 | TFLOPs: 22.80 | 31: iteration 314300/ 476837 | consumed samples: 80460800 | consumed tokens: 164783718400 | elapsed time per iteration (s): 0.68 | learning rate: 6.771E-05 | global batch size: 256 | lm loss: 2.467442E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.663 | TFLOPs: 22.79 | 31: iteration 314400/ 476837 | consumed samples: 80486400 | consumed tokens: 164836147200 | elapsed time per iteration (s): 0.68 | learning rate: 6.766E-05 | global batch size: 256 | lm loss: 2.465224E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.655 | TFLOPs: 22.79 | 31: iteration 314500/ 476837 | consumed samples: 80512000 | consumed tokens: 164888576000 | elapsed time per iteration (s): 0.68 | learning rate: 6.761E-05 | global batch size: 256 | lm loss: 2.469511E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.202 | TFLOPs: 22.76 | 31: iteration 314600/ 476837 | consumed samples: 80537600 | consumed tokens: 164941004800 | elapsed time per iteration (s): 0.68 | learning rate: 6.756E-05 | global batch size: 256 | lm loss: 2.470751E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.469 | TFLOPs: 22.78 | 31: iteration 314700/ 476837 | consumed samples: 80563200 | consumed tokens: 164993433600 | elapsed time per iteration (s): 0.68 | learning rate: 6.750E-05 | global batch size: 256 | lm loss: 2.466621E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.568 | TFLOPs: 22.78 | 31: iteration 314800/ 476837 | consumed samples: 80588800 | consumed tokens: 165045862400 | elapsed time per iteration (s): 0.68 | learning rate: 6.745E-05 | global batch size: 256 | lm loss: 2.468243E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.578 | TFLOPs: 22.78 | 31: iteration 314900/ 476837 | consumed samples: 80614400 | consumed tokens: 165098291200 | elapsed time per iteration (s): 0.68 | learning rate: 6.740E-05 | global batch size: 256 | lm loss: 2.470314E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.640 | TFLOPs: 22.79 | 31: iteration 315000/ 476837 | consumed samples: 80640000 | consumed tokens: 165150720000 | elapsed time per iteration (s): 0.68 | learning rate: 6.734E-05 | global batch size: 256 | lm loss: 2.465445E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.640 | TFLOPs: 22.79 | 31: iteration 315100/ 476837 | consumed samples: 80665600 | consumed tokens: 165203148800 | elapsed time per iteration (s): 0.68 | learning rate: 6.729E-05 | global batch size: 256 | lm loss: 2.470672E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.776 | TFLOPs: 22.73 | 31: iteration 315200/ 476837 | consumed samples: 80691200 | consumed tokens: 165255577600 | elapsed time per iteration (s): 0.68 | learning rate: 6.724E-05 | global batch size: 256 | lm loss: 2.465207E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.071 | TFLOPs: 22.75 | 31: iteration 315300/ 476837 | consumed samples: 80716800 | consumed tokens: 165308006400 | elapsed time per iteration (s): 0.68 | learning rate: 6.719E-05 | global batch size: 256 | lm loss: 2.464150E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.795 | TFLOPs: 22.67 | 31: iteration 315400/ 476837 | consumed samples: 80742400 | consumed tokens: 165360435200 | elapsed time per iteration (s): 0.68 | learning rate: 6.713E-05 | global batch size: 256 | lm loss: 2.463692E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.456 | TFLOPs: 22.71 | 31: iteration 315500/ 476837 | consumed samples: 80768000 | consumed tokens: 165412864000 | elapsed time per iteration (s): 0.68 | learning rate: 6.708E-05 | global batch size: 256 | lm loss: 2.471451E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.646 | TFLOPs: 22.73 | 31: iteration 315600/ 476837 | consumed samples: 80793600 | consumed tokens: 165465292800 | elapsed time per iteration (s): 0.68 | learning rate: 6.703E-05 | global batch size: 256 | lm loss: 2.467574E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.437 | TFLOPs: 22.71 | 31: iteration 315700/ 476837 | consumed samples: 80819200 | consumed tokens: 165517721600 | elapsed time per iteration (s): 0.68 | learning rate: 6.698E-05 | global batch size: 256 | lm loss: 2.469827E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.608 | TFLOPs: 22.66 | 31: iteration 315800/ 476837 | consumed samples: 80844800 | consumed tokens: 165570150400 | elapsed time per iteration (s): 0.68 | learning rate: 6.692E-05 | global batch size: 256 | lm loss: 2.469437E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.873 | TFLOPs: 22.74 | 31: iteration 315900/ 476837 | consumed samples: 80870400 | consumed tokens: 165622579200 | elapsed time per iteration (s): 0.68 | learning rate: 6.687E-05 | global batch size: 256 | lm loss: 2.467300E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.520 | TFLOPs: 22.72 | 0: [2023-04-28 09:44:14,463] [INFO] [logging.py:68:log_dist] [Rank 0] step=316000, skipped=0, lr=[6.681745954954335e-05, 6.681745954954335e-05, 6.681745954954335e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 316000/ 476837 | consumed samples: 80896000 | consumed tokens: 165675008000 | elapsed time per iteration (s): 0.68 | learning rate: 6.682E-05 | global batch size: 256 | lm loss: 2.468303E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.329 | TFLOPs: 22.71 | 0: steps: 316000 loss: 2.4290 iter time (s): 0.677 samples/sec: 377.863 31: iteration 316100/ 476837 | consumed samples: 80921600 | consumed tokens: 165727436800 | elapsed time per iteration (s): 0.68 | learning rate: 6.676E-05 | global batch size: 256 | lm loss: 2.467083E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.561 | TFLOPs: 22.78 | 31: iteration 316200/ 476837 | consumed samples: 80947200 | consumed tokens: 165779865600 | elapsed time per iteration (s): 0.68 | learning rate: 6.671E-05 | global batch size: 256 | lm loss: 2.469245E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.668 | TFLOPs: 22.67 | 31: iteration 316300/ 476837 | consumed samples: 80972800 | consumed tokens: 165832294400 | elapsed time per iteration (s): 0.68 | learning rate: 6.666E-05 | global batch size: 256 | lm loss: 2.468501E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.415 | TFLOPs: 22.77 | 31: iteration 316400/ 476837 | consumed samples: 80998400 | consumed tokens: 165884723200 | elapsed time per iteration (s): 0.68 | learning rate: 6.661E-05 | global batch size: 256 | lm loss: 2.464753E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.578 | TFLOPs: 22.78 | 31: iteration 316500/ 476837 | consumed samples: 81024000 | consumed tokens: 165937152000 | elapsed time per iteration (s): 0.68 | learning rate: 6.655E-05 | global batch size: 256 | lm loss: 2.463599E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.832 | TFLOPs: 22.74 | 31: iteration 316600/ 476837 | consumed samples: 81049600 | consumed tokens: 165989580800 | elapsed time per iteration (s): 0.68 | learning rate: 6.650E-05 | global batch size: 256 | lm loss: 2.473548E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.691 | TFLOPs: 22.79 | 31: iteration 316700/ 476837 | consumed samples: 81075200 | consumed tokens: 166042009600 | elapsed time per iteration (s): 0.68 | learning rate: 6.645E-05 | global batch size: 256 | lm loss: 2.460053E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.534 | TFLOPs: 22.78 | 31: iteration 316800/ 476837 | consumed samples: 81100800 | consumed tokens: 166094438400 | elapsed time per iteration (s): 0.68 | learning rate: 6.640E-05 | global batch size: 256 | lm loss: 2.464345E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.558 | TFLOPs: 22.78 | 31: iteration 316900/ 476837 | consumed samples: 81126400 | consumed tokens: 166146867200 | elapsed time per iteration (s): 0.68 | learning rate: 6.635E-05 | global batch size: 256 | lm loss: 2.474976E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.738 | TFLOPs: 22.67 | 31: iteration 317000/ 476837 | consumed samples: 81152000 | consumed tokens: 166199296000 | elapsed time per iteration (s): 0.87 | learning rate: 6.629E-05 | global batch size: 256 | lm loss: 2.463838E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 295.364 | TFLOPs: 17.87 | 31: iteration 317100/ 476837 | consumed samples: 81177600 | consumed tokens: 166251724800 | elapsed time per iteration (s): 0.73 | learning rate: 6.624E-05 | global batch size: 256 | lm loss: 2.464636E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 348.406 | TFLOPs: 21.08 | 31: iteration 317200/ 476837 | consumed samples: 81203200 | consumed tokens: 166304153600 | elapsed time per iteration (s): 0.68 | learning rate: 6.619E-05 | global batch size: 256 | lm loss: 2.467129E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.533 | TFLOPs: 22.72 | 31: iteration 317300/ 476837 | consumed samples: 81228800 | consumed tokens: 166356582400 | elapsed time per iteration (s): 0.68 | learning rate: 6.614E-05 | global batch size: 256 | lm loss: 2.467787E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.800 | TFLOPs: 22.80 | 31: iteration 317400/ 476837 | consumed samples: 81254400 | consumed tokens: 166409011200 | elapsed time per iteration (s): 0.68 | learning rate: 6.608E-05 | global batch size: 256 | lm loss: 2.467873E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.723 | TFLOPs: 22.79 | 31: iteration 317500/ 476837 | consumed samples: 81280000 | consumed tokens: 166461440000 | elapsed time per iteration (s): 0.68 | learning rate: 6.603E-05 | global batch size: 256 | lm loss: 2.464109E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.505 | TFLOPs: 22.72 | 31: iteration 317600/ 476837 | consumed samples: 81305600 | consumed tokens: 166513868800 | elapsed time per iteration (s): 0.68 | learning rate: 6.598E-05 | global batch size: 256 | lm loss: 2.463410E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.087 | TFLOPs: 22.75 | 31: iteration 317700/ 476837 | consumed samples: 81331200 | consumed tokens: 166566297600 | elapsed time per iteration (s): 0.68 | learning rate: 6.593E-05 | global batch size: 256 | lm loss: 2.466441E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.407 | TFLOPs: 22.77 | 31: iteration 317800/ 476837 | consumed samples: 81356800 | consumed tokens: 166618726400 | elapsed time per iteration (s): 0.68 | learning rate: 6.587E-05 | global batch size: 256 | lm loss: 2.466480E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.026 | TFLOPs: 22.75 | 31: iteration 317900/ 476837 | consumed samples: 81382400 | consumed tokens: 166671155200 | elapsed time per iteration (s): 0.68 | learning rate: 6.582E-05 | global batch size: 256 | lm loss: 2.464096E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.234 | TFLOPs: 22.76 | 0: [2023-04-28 10:07:19,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=318000, skipped=0, lr=[6.577031685837343e-05, 6.577031685837343e-05, 6.577031685837343e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 318000/ 476837 | consumed samples: 81408000 | consumed tokens: 166723584000 | elapsed time per iteration (s): 0.68 | learning rate: 6.577E-05 | global batch size: 256 | lm loss: 2.464530E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.961 | TFLOPs: 22.74 | 0: steps: 318000 loss: 2.4384 iter time (s): 0.689 samples/sec: 371.385 31: iteration 318100/ 476837 | consumed samples: 81433600 | consumed tokens: 166776012800 | elapsed time per iteration (s): 0.69 | learning rate: 6.572E-05 | global batch size: 256 | lm loss: 2.461976E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.643 | TFLOPs: 22.54 | 31: iteration 318200/ 476837 | consumed samples: 81459200 | consumed tokens: 166828441600 | elapsed time per iteration (s): 0.68 | learning rate: 6.567E-05 | global batch size: 256 | lm loss: 2.464376E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.491 | TFLOPs: 22.78 | 31: iteration 318300/ 476837 | consumed samples: 81484800 | consumed tokens: 166880870400 | elapsed time per iteration (s): 0.68 | learning rate: 6.561E-05 | global batch size: 256 | lm loss: 2.466422E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.530 | TFLOPs: 22.78 | 31: iteration 318400/ 476837 | consumed samples: 81510400 | consumed tokens: 166933299200 | elapsed time per iteration (s): 0.68 | learning rate: 6.556E-05 | global batch size: 256 | lm loss: 2.460207E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.081 | TFLOPs: 22.69 | 31: iteration 318500/ 476837 | consumed samples: 81536000 | consumed tokens: 166985728000 | elapsed time per iteration (s): 0.68 | learning rate: 6.551E-05 | global batch size: 256 | lm loss: 2.462065E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.244 | TFLOPs: 22.76 | 31: iteration 318600/ 476837 | consumed samples: 81561600 | consumed tokens: 167038156800 | elapsed time per iteration (s): 0.68 | learning rate: 6.546E-05 | global batch size: 256 | lm loss: 2.463509E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.221 | TFLOPs: 22.76 | 31: iteration 318700/ 476837 | consumed samples: 81587200 | consumed tokens: 167090585600 | elapsed time per iteration (s): 0.68 | learning rate: 6.541E-05 | global batch size: 256 | lm loss: 2.463121E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.160 | TFLOPs: 22.64 | 31: iteration 318800/ 476837 | consumed samples: 81612800 | consumed tokens: 167143014400 | elapsed time per iteration (s): 0.68 | learning rate: 6.535E-05 | global batch size: 256 | lm loss: 2.465549E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.394 | TFLOPs: 22.77 | 31: iteration 318900/ 476837 | consumed samples: 81638400 | consumed tokens: 167195443200 | elapsed time per iteration (s): 0.68 | learning rate: 6.530E-05 | global batch size: 256 | lm loss: 2.467796E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.730 | TFLOPs: 22.73 | 31: iteration 319000/ 476837 | consumed samples: 81664000 | consumed tokens: 167247872000 | elapsed time per iteration (s): 0.68 | learning rate: 6.525E-05 | global batch size: 256 | lm loss: 2.459238E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.650 | TFLOPs: 22.79 | 31: iteration 319100/ 476837 | consumed samples: 81689600 | consumed tokens: 167300300800 | elapsed time per iteration (s): 0.68 | learning rate: 6.520E-05 | global batch size: 256 | lm loss: 2.466123E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.036 | TFLOPs: 22.75 | 31: iteration 319200/ 476837 | consumed samples: 81715200 | consumed tokens: 167352729600 | elapsed time per iteration (s): 0.68 | learning rate: 6.515E-05 | global batch size: 256 | lm loss: 2.462437E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.656 | TFLOPs: 22.73 | 31: iteration 319300/ 476837 | consumed samples: 81740800 | consumed tokens: 167405158400 | elapsed time per iteration (s): 0.68 | learning rate: 6.509E-05 | global batch size: 256 | lm loss: 2.467934E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.305 | TFLOPs: 22.77 | 31: iteration 319400/ 476837 | consumed samples: 81766400 | consumed tokens: 167457587200 | elapsed time per iteration (s): 0.68 | learning rate: 6.504E-05 | global batch size: 256 | lm loss: 2.462220E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.797 | TFLOPs: 22.80 | 31: iteration 319500/ 476837 | consumed samples: 81792000 | consumed tokens: 167510016000 | elapsed time per iteration (s): 0.68 | learning rate: 6.499E-05 | global batch size: 256 | lm loss: 2.462503E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.391 | TFLOPs: 22.77 | 31: iteration 319600/ 476837 | consumed samples: 81817600 | consumed tokens: 167562444800 | elapsed time per iteration (s): 0.68 | learning rate: 6.494E-05 | global batch size: 256 | lm loss: 2.466483E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.220 | TFLOPs: 22.70 | 31: iteration 319700/ 476837 | consumed samples: 81843200 | consumed tokens: 167614873600 | elapsed time per iteration (s): 0.68 | learning rate: 6.489E-05 | global batch size: 256 | lm loss: 2.466762E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.338 | TFLOPs: 22.77 | 31: iteration 319800/ 476837 | consumed samples: 81868800 | consumed tokens: 167667302400 | elapsed time per iteration (s): 0.68 | learning rate: 6.483E-05 | global batch size: 256 | lm loss: 2.463793E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.963 | TFLOPs: 22.81 | 31: iteration 319900/ 476837 | consumed samples: 81894400 | consumed tokens: 167719731200 | elapsed time per iteration (s): 0.68 | learning rate: 6.478E-05 | global batch size: 256 | lm loss: 2.464988E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.958 | TFLOPs: 22.81 | 0: [2023-04-28 10:30:01,578] [INFO] [logging.py:68:log_dist] [Rank 0] step=320000, skipped=0, lr=[6.473100948430931e-05, 6.473100948430931e-05, 6.473100948430931e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 320000/ 476837 | consumed samples: 81920000 | consumed tokens: 167772160000 | elapsed time per iteration (s): 0.68 | learning rate: 6.473E-05 | global batch size: 256 | lm loss: 2.462788E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.913 | TFLOPs: 22.80 | 0: steps: 320000 loss: 2.4195 iter time (s): 0.678 samples/sec: 377.700 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 320000 | lm loss value: 2.928006E+00 | lm loss PPL: 1.869033E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 320000 to checkpoints_1b1250b1b5 0: [2023-04-28 10:30:01,845] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step320000 is begin to save! 0: [2023-04-28 10:30:01,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_01-model_00-model_states.pt... 0: [2023-04-28 10:30:02,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_01-model_00-model_states.pt. 0: [2023-04-28 10:30:02,108] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_03-model_00-model_states.pt... 0: [2023-04-28 10:30:02,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_03-model_00-model_states.pt. 0: [2023-04-28 10:30:02,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_04-model_00-model_states.pt... 0: [2023-04-28 10:30:02,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_04-model_00-model_states.pt. 0: [2023-04-28 10:30:02,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_05-model_00-model_states.pt... 0: [2023-04-28 10:30:02,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_05-model_00-model_states.pt. 0: [2023-04-28 10:30:02,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_06-model_00-model_states.pt... 0: [2023-04-28 10:30:02,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_06-model_00-model_states.pt. 0: [2023-04-28 10:30:02,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_07-model_00-model_states.pt... 0: [2023-04-28 10:30:02,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_07-model_00-model_states.pt. 0: [2023-04-28 10:30:02,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_08-model_00-model_states.pt... 0: [2023-04-28 10:30:02,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_08-model_00-model_states.pt. 0: [2023-04-28 10:30:02,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_09-model_00-model_states.pt... 0: [2023-04-28 10:30:02,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_09-model_00-model_states.pt. 0: [2023-04-28 10:30:02,727] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_10-model_00-model_states.pt... 0: [2023-04-28 10:30:02,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_10-model_00-model_states.pt. 0: [2023-04-28 10:30:02,814] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_11-model_00-model_states.pt... 0: [2023-04-28 10:30:02,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_11-model_00-model_states.pt. 0: [2023-04-28 10:30:02,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_12-model_00-model_states.pt... 0: [2023-04-28 10:30:03,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_12-model_00-model_states.pt. 0: [2023-04-28 10:30:03,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_13-model_00-model_states.pt... 0: [2023-04-28 10:30:03,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_13-model_00-model_states.pt. 0: [2023-04-28 10:30:03,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_14-model_00-model_states.pt... 0: [2023-04-28 10:30:03,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_14-model_00-model_states.pt. 0: [2023-04-28 10:30:03,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_15-model_00-model_states.pt... 0: [2023-04-28 10:30:03,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_15-model_00-model_states.pt. 0: [2023-04-28 10:30:03,238] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_16-model_00-model_states.pt... 0: [2023-04-28 10:30:03,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_16-model_00-model_states.pt. 0: [2023-04-28 10:30:03,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_17-model_00-model_states.pt... 0: [2023-04-28 10:30:03,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_17-model_00-model_states.pt. 0: [2023-04-28 10:30:03,401] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_18-model_00-model_states.pt... 0: [2023-04-28 10:30:03,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_18-model_00-model_states.pt. 0: [2023-04-28 10:30:03,486] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_19-model_00-model_states.pt... 0: [2023-04-28 10:30:03,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_19-model_00-model_states.pt. 0: [2023-04-28 10:30:03,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_20-model_00-model_states.pt... 0: [2023-04-28 10:30:03,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_20-model_00-model_states.pt. 0: [2023-04-28 10:30:03,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_21-model_00-model_states.pt... 0: [2023-04-28 10:30:03,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_21-model_00-model_states.pt. 0: [2023-04-28 10:30:03,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_22-model_00-model_states.pt... 0: [2023-04-28 10:30:03,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_22-model_00-model_states.pt. 0: [2023-04-28 10:30:03,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_23-model_00-model_states.pt... 0: [2023-04-28 10:30:03,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_23-model_00-model_states.pt. 0: [2023-04-28 10:30:03,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_24-model_00-model_states.pt... 0: [2023-04-28 10:30:03,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_24-model_00-model_states.pt. 0: [2023-04-28 10:30:03,999] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_25-model_00-model_states.pt... 0: [2023-04-28 10:30:04,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_25-model_00-model_states.pt. 0: [2023-04-28 10:30:04,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_26-model_00-model_states.pt... 0: [2023-04-28 10:30:04,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_26-model_00-model_states.pt. 0: [2023-04-28 10:30:04,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_27-model_00-model_states.pt... 0: [2023-04-28 10:30:04,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_27-model_00-model_states.pt. 0: [2023-04-28 10:30:04,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_28-model_00-model_states.pt... 0: [2023-04-28 10:30:04,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_28-model_00-model_states.pt. 0: [2023-04-28 10:30:04,357] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/layer_30-model_00-model_states.pt... 0: [2023-04-28 10:30:04,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/layer_30-model_00-model_states.pt. 0: [2023-04-28 10:30:04,360] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step320000/mp_rank_00_model_states.pt 0: [2023-04-28 10:30:04,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/mp_rank_00_model_states.pt... 0: [2023-04-28 10:30:04,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/mp_rank_00_model_states.pt. 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 15: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 28: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 3: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 9: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 21: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 16: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 18: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 17: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 29: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 26: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 1: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 8: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 19: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 31: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 22: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 27: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 23: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 30: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-04-28 10:30:04,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 17: [2023-04-28 10:30:04,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 17: [2023-04-28 10:30:04,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 17: [2023-04-28 10:30:04,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 17: [2023-04-28 10:30:04,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-28 10:30:04,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 4: [2023-04-28 10:30:04,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-28 10:30:04,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-28 10:30:04,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-28 10:30:04,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 17: [2023-04-28 10:30:04,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 17: [2023-04-28 10:30:04,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-28 10:30:04,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-28 10:30:04,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-28 10:30:04,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-28 10:30:04,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 8: [2023-04-28 10:30:04,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 16: [2023-04-28 10:30:04,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-28 10:30:04,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-28 10:30:04,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 25: [2023-04-28 10:30:04,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 12: [2023-04-28 10:30:04,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 25: [2023-04-28 10:30:04,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 12: [2023-04-28 10:30:04,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 25: [2023-04-28 10:30:04,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 12: [2023-04-28 10:30:04,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 20: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-28 10:30:04,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 15: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 29: [2023-04-28 10:30:04,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 23: [2023-04-28 10:30:04,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-28 10:30:04,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 26: [2023-04-28 10:30:04,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 24: [2023-04-28 10:30:04,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 14: [2023-04-28 10:30:04,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 28: [2023-04-28 10:30:04,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: [2023-04-28 10:30:04,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-28 10:30:04,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 30: [2023-04-28 10:30:04,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-28 10:30:04,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-28 10:30:04,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 6: [2023-04-28 10:30:04,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 10: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 27: [2023-04-28 10:30:04,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 31: [2023-04-28 10:30:04,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-28 10:30:04,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-28 10:30:04,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 11: [2023-04-28 10:30:04,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 2: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 9: [2023-04-28 10:30:04,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-28 10:30:04,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-28 10:30:04,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-28 10:30:04,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 18: [2023-04-28 10:30:04,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 13: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 21: [2023-04-28 10:30:04,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 22: [2023-04-28 10:30:04,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-28 10:30:04,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-28 10:30:04,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 1: [2023-04-28 10:30:04,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 7: [2023-04-28 10:30:04,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-28 10:30:04,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-28 10:30:04,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 8: [2023-04-28 10:30:04,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-28 10:30:04,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-28 10:30:04,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 19: [2023-04-28 10:30:04,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 5: [2023-04-28 10:30:04,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-28 10:30:04,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-28 10:30:04,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 3: [2023-04-28 10:30:04,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-28 10:30:04,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step320000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-28 10:30:04,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step320000 is ready now! 0: successfully saved checkpoint at iteration 320000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 2882.64 31: iteration 320100/ 476837 | consumed samples: 81945600 | consumed tokens: 167824588800 | elapsed time per iteration (s): 0.84 | learning rate: 6.468E-05 | global batch size: 256 | lm loss: 2.465019E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 306.017 | TFLOPs: 18.51 | 31: iteration 320200/ 476837 | consumed samples: 81971200 | consumed tokens: 167877017600 | elapsed time per iteration (s): 0.80 | learning rate: 6.463E-05 | global batch size: 256 | lm loss: 2.466831E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 321.697 | TFLOPs: 19.46 | 31: iteration 320300/ 476837 | consumed samples: 81996800 | consumed tokens: 167929446400 | elapsed time per iteration (s): 0.68 | learning rate: 6.458E-05 | global batch size: 256 | lm loss: 2.464684E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.533 | TFLOPs: 22.78 | 31: iteration 320400/ 476837 | consumed samples: 82022400 | consumed tokens: 167981875200 | elapsed time per iteration (s): 0.68 | learning rate: 6.452E-05 | global batch size: 256 | lm loss: 2.463616E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.572 | TFLOPs: 22.78 | 31: iteration 320500/ 476837 | consumed samples: 82048000 | consumed tokens: 168034304000 | elapsed time per iteration (s): 0.68 | learning rate: 6.447E-05 | global batch size: 256 | lm loss: 2.463294E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.544 | TFLOPs: 22.78 | 31: iteration 320600/ 476837 | consumed samples: 82073600 | consumed tokens: 168086732800 | elapsed time per iteration (s): 0.68 | learning rate: 6.442E-05 | global batch size: 256 | lm loss: 2.463356E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.694 | TFLOPs: 22.79 | 31: iteration 320700/ 476837 | consumed samples: 82099200 | consumed tokens: 168139161600 | elapsed time per iteration (s): 0.68 | learning rate: 6.437E-05 | global batch size: 256 | lm loss: 2.462403E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.287 | TFLOPs: 22.76 | 31: iteration 320800/ 476837 | consumed samples: 82124800 | consumed tokens: 168191590400 | elapsed time per iteration (s): 0.68 | learning rate: 6.432E-05 | global batch size: 256 | lm loss: 2.463851E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.470 | TFLOPs: 22.71 | 31: iteration 320900/ 476837 | consumed samples: 82150400 | consumed tokens: 168244019200 | elapsed time per iteration (s): 0.68 | learning rate: 6.427E-05 | global batch size: 256 | lm loss: 2.464072E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.253 | TFLOPs: 22.76 | 31: iteration 321000/ 476837 | consumed samples: 82176000 | consumed tokens: 168296448000 | elapsed time per iteration (s): 0.68 | learning rate: 6.421E-05 | global batch size: 256 | lm loss: 2.466441E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.290 | TFLOPs: 22.70 | 31: iteration 321100/ 476837 | consumed samples: 82201600 | consumed tokens: 168348876800 | elapsed time per iteration (s): 0.68 | learning rate: 6.416E-05 | global batch size: 256 | lm loss: 2.460666E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.742 | TFLOPs: 22.67 | 31: iteration 321200/ 476837 | consumed samples: 82227200 | consumed tokens: 168401305600 | elapsed time per iteration (s): 0.68 | learning rate: 6.411E-05 | global batch size: 256 | lm loss: 2.463668E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.500 | TFLOPs: 22.72 | 31: iteration 321300/ 476837 | consumed samples: 82252800 | consumed tokens: 168453734400 | elapsed time per iteration (s): 0.68 | learning rate: 6.406E-05 | global batch size: 256 | lm loss: 2.467290E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.825 | TFLOPs: 22.74 | 31: iteration 321400/ 476837 | consumed samples: 82278400 | consumed tokens: 168506163200 | elapsed time per iteration (s): 0.68 | learning rate: 6.401E-05 | global batch size: 256 | lm loss: 2.462268E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.190 | TFLOPs: 22.70 | 31: iteration 321500/ 476837 | consumed samples: 82304000 | consumed tokens: 168558592000 | elapsed time per iteration (s): 0.68 | learning rate: 6.396E-05 | global batch size: 256 | lm loss: 2.463653E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.234 | TFLOPs: 22.76 | 31: iteration 321600/ 476837 | consumed samples: 82329600 | consumed tokens: 168611020800 | elapsed time per iteration (s): 0.68 | learning rate: 6.391E-05 | global batch size: 256 | lm loss: 2.457487E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.376 | TFLOPs: 22.71 | 31: iteration 321700/ 476837 | consumed samples: 82355200 | consumed tokens: 168663449600 | elapsed time per iteration (s): 0.68 | learning rate: 6.385E-05 | global batch size: 256 | lm loss: 2.460578E+00 | grad norm: 0.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.217 | TFLOPs: 22.76 | 31: iteration 321800/ 476837 | consumed samples: 82380800 | consumed tokens: 168715878400 | elapsed time per iteration (s): 0.68 | learning rate: 6.380E-05 | global batch size: 256 | lm loss: 2.464235E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.893 | TFLOPs: 22.68 | 31: iteration 321900/ 476837 | consumed samples: 82406400 | consumed tokens: 168768307200 | elapsed time per iteration (s): 0.68 | learning rate: 6.375E-05 | global batch size: 256 | lm loss: 2.459370E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.506 | TFLOPs: 22.78 | 0: [2023-04-28 10:53:10,544] [INFO] [logging.py:68:log_dist] [Rank 0] step=322000, skipped=0, lr=[6.369972154132765e-05, 6.369972154132765e-05, 6.369972154132765e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 322000/ 476837 | consumed samples: 82432000 | consumed tokens: 168820736000 | elapsed time per iteration (s): 0.68 | learning rate: 6.370E-05 | global batch size: 256 | lm loss: 2.464388E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.783 | TFLOPs: 22.79 | 0: steps: 322000 loss: 2.4216 iter time (s): 0.690 samples/sec: 371.084 31: iteration 322100/ 476837 | consumed samples: 82457600 | consumed tokens: 168873164800 | elapsed time per iteration (s): 0.68 | learning rate: 6.365E-05 | global batch size: 256 | lm loss: 2.466092E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.699 | TFLOPs: 22.73 | 31: iteration 322200/ 476837 | consumed samples: 82483200 | consumed tokens: 168925593600 | elapsed time per iteration (s): 0.68 | learning rate: 6.360E-05 | global batch size: 256 | lm loss: 2.463600E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.420 | TFLOPs: 22.77 | 31: iteration 322300/ 476837 | consumed samples: 82508800 | consumed tokens: 168978022400 | elapsed time per iteration (s): 0.68 | learning rate: 6.355E-05 | global batch size: 256 | lm loss: 2.462091E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.637 | TFLOPs: 22.79 | 31: iteration 322400/ 476837 | consumed samples: 82534400 | consumed tokens: 169030451200 | elapsed time per iteration (s): 0.68 | learning rate: 6.349E-05 | global batch size: 256 | lm loss: 2.464022E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.380 | TFLOPs: 22.77 | 31: iteration 322500/ 476837 | consumed samples: 82560000 | consumed tokens: 169082880000 | elapsed time per iteration (s): 0.68 | learning rate: 6.344E-05 | global batch size: 256 | lm loss: 2.462006E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.837 | TFLOPs: 22.80 | 31: iteration 322600/ 476837 | consumed samples: 82585600 | consumed tokens: 169135308800 | elapsed time per iteration (s): 0.69 | learning rate: 6.339E-05 | global batch size: 256 | lm loss: 2.461429E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.513 | TFLOPs: 22.48 | 31: iteration 322700/ 476837 | consumed samples: 82611200 | consumed tokens: 169187737600 | elapsed time per iteration (s): 0.68 | learning rate: 6.334E-05 | global batch size: 256 | lm loss: 2.459655E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.222 | TFLOPs: 22.70 | 31: iteration 322800/ 476837 | consumed samples: 82636800 | consumed tokens: 169240166400 | elapsed time per iteration (s): 0.68 | learning rate: 6.329E-05 | global batch size: 256 | lm loss: 2.459400E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.328 | TFLOPs: 22.77 | 31: iteration 322900/ 476837 | consumed samples: 82662400 | consumed tokens: 169292595200 | elapsed time per iteration (s): 0.68 | learning rate: 6.324E-05 | global batch size: 256 | lm loss: 2.461582E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.582 | TFLOPs: 22.72 | 31: iteration 323000/ 476837 | consumed samples: 82688000 | consumed tokens: 169345024000 | elapsed time per iteration (s): 0.68 | learning rate: 6.319E-05 | global batch size: 256 | lm loss: 2.459502E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.995 | TFLOPs: 22.81 | 31: iteration 323100/ 476837 | consumed samples: 82713600 | consumed tokens: 169397452800 | elapsed time per iteration (s): 0.68 | learning rate: 6.314E-05 | global batch size: 256 | lm loss: 2.457559E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.935 | TFLOPs: 22.80 | 31: iteration 323200/ 476837 | consumed samples: 82739200 | consumed tokens: 169449881600 | elapsed time per iteration (s): 0.69 | learning rate: 6.308E-05 | global batch size: 256 | lm loss: 2.460261E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.282 | TFLOPs: 22.52 | 31: iteration 323300/ 476837 | consumed samples: 82764800 | consumed tokens: 169502310400 | elapsed time per iteration (s): 0.90 | learning rate: 6.303E-05 | global batch size: 256 | lm loss: 2.463916E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 283.708 | TFLOPs: 17.16 | 31: iteration 323400/ 476837 | consumed samples: 82790400 | consumed tokens: 169554739200 | elapsed time per iteration (s): 0.69 | learning rate: 6.298E-05 | global batch size: 256 | lm loss: 2.457994E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 369.277 | TFLOPs: 22.34 | 31: iteration 323500/ 476837 | consumed samples: 82816000 | consumed tokens: 169607168000 | elapsed time per iteration (s): 0.68 | learning rate: 6.293E-05 | global batch size: 256 | lm loss: 2.463722E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.787 | TFLOPs: 22.79 | 31: iteration 323600/ 476837 | consumed samples: 82841600 | consumed tokens: 169659596800 | elapsed time per iteration (s): 0.68 | learning rate: 6.288E-05 | global batch size: 256 | lm loss: 2.457534E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.772 | TFLOPs: 22.79 | 31: iteration 323700/ 476837 | consumed samples: 82867200 | consumed tokens: 169712025600 | elapsed time per iteration (s): 0.68 | learning rate: 6.283E-05 | global batch size: 256 | lm loss: 2.463987E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.780 | TFLOPs: 22.79 | 31: iteration 323800/ 476837 | consumed samples: 82892800 | consumed tokens: 169764454400 | elapsed time per iteration (s): 0.68 | learning rate: 6.278E-05 | global batch size: 256 | lm loss: 2.459793E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.259 | TFLOPs: 22.76 | 31: iteration 323900/ 476837 | consumed samples: 82918400 | consumed tokens: 169816883200 | elapsed time per iteration (s): 0.68 | learning rate: 6.273E-05 | global batch size: 256 | lm loss: 2.460376E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.781 | TFLOPs: 22.79 | 0: [2023-04-28 11:16:16,042] [INFO] [logging.py:68:log_dist] [Rank 0] step=324000, skipped=0, lr=[6.26766357227575e-05, 6.26766357227575e-05, 6.26766357227575e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 324000/ 476837 | consumed samples: 82944000 | consumed tokens: 169869312000 | elapsed time per iteration (s): 0.68 | learning rate: 6.268E-05 | global batch size: 256 | lm loss: 2.463266E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.646 | TFLOPs: 22.73 | 0: steps: 324000 loss: 2.4203 iter time (s): 0.689 samples/sec: 371.337 31: iteration 324100/ 476837 | consumed samples: 82969600 | consumed tokens: 169921740800 | elapsed time per iteration (s): 0.68 | learning rate: 6.263E-05 | global batch size: 256 | lm loss: 2.456126E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.035 | TFLOPs: 22.69 | 31: iteration 324200/ 476837 | consumed samples: 82995200 | consumed tokens: 169974169600 | elapsed time per iteration (s): 0.68 | learning rate: 6.257E-05 | global batch size: 256 | lm loss: 2.464876E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.067 | TFLOPs: 22.75 | 31: iteration 324300/ 476837 | consumed samples: 83020800 | consumed tokens: 170026598400 | elapsed time per iteration (s): 0.68 | learning rate: 6.252E-05 | global batch size: 256 | lm loss: 2.455950E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.495 | TFLOPs: 22.72 | 31: iteration 324400/ 476837 | consumed samples: 83046400 | consumed tokens: 170079027200 | elapsed time per iteration (s): 0.68 | learning rate: 6.247E-05 | global batch size: 256 | lm loss: 2.464972E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.745 | TFLOPs: 22.73 | 31: iteration 324500/ 476837 | consumed samples: 83072000 | consumed tokens: 170131456000 | elapsed time per iteration (s): 0.68 | learning rate: 6.242E-05 | global batch size: 256 | lm loss: 2.463869E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.878 | TFLOPs: 22.74 | 31: iteration 324600/ 476837 | consumed samples: 83097600 | consumed tokens: 170183884800 | elapsed time per iteration (s): 0.68 | learning rate: 6.237E-05 | global batch size: 256 | lm loss: 2.463084E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.970 | TFLOPs: 22.75 | 31: iteration 324700/ 476837 | consumed samples: 83123200 | consumed tokens: 170236313600 | elapsed time per iteration (s): 0.68 | learning rate: 6.232E-05 | global batch size: 256 | lm loss: 2.459021E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.655 | TFLOPs: 22.79 | 31: iteration 324800/ 476837 | consumed samples: 83148800 | consumed tokens: 170288742400 | elapsed time per iteration (s): 0.68 | learning rate: 6.227E-05 | global batch size: 256 | lm loss: 2.459794E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.159 | TFLOPs: 22.76 | 31: iteration 324900/ 476837 | consumed samples: 83174400 | consumed tokens: 170341171200 | elapsed time per iteration (s): 0.68 | learning rate: 6.222E-05 | global batch size: 256 | lm loss: 2.457248E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.727 | TFLOPs: 22.73 | 31: iteration 325000/ 476837 | consumed samples: 83200000 | consumed tokens: 170393600000 | elapsed time per iteration (s): 0.68 | learning rate: 6.217E-05 | global batch size: 256 | lm loss: 2.455140E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.727 | TFLOPs: 22.79 | 31: iteration 325100/ 476837 | consumed samples: 83225600 | consumed tokens: 170446028800 | elapsed time per iteration (s): 0.68 | learning rate: 6.212E-05 | global batch size: 256 | lm loss: 2.457211E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.450 | TFLOPs: 22.77 | 31: iteration 325200/ 476837 | consumed samples: 83251200 | consumed tokens: 170498457600 | elapsed time per iteration (s): 0.68 | learning rate: 6.207E-05 | global batch size: 256 | lm loss: 2.457796E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.466 | TFLOPs: 22.71 | 31: iteration 325300/ 476837 | consumed samples: 83276800 | consumed tokens: 170550886400 | elapsed time per iteration (s): 0.68 | learning rate: 6.202E-05 | global batch size: 256 | lm loss: 2.459859E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.700 | TFLOPs: 22.79 | 31: iteration 325400/ 476837 | consumed samples: 83302400 | consumed tokens: 170603315200 | elapsed time per iteration (s): 0.68 | learning rate: 6.197E-05 | global batch size: 256 | lm loss: 2.457091E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.969 | TFLOPs: 22.75 | 31: iteration 325500/ 476837 | consumed samples: 83328000 | consumed tokens: 170655744000 | elapsed time per iteration (s): 0.68 | learning rate: 6.191E-05 | global batch size: 256 | lm loss: 2.455884E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.541 | TFLOPs: 22.78 | 31: iteration 325600/ 476837 | consumed samples: 83353600 | consumed tokens: 170708172800 | elapsed time per iteration (s): 0.68 | learning rate: 6.186E-05 | global batch size: 256 | lm loss: 2.462226E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.632 | TFLOPs: 22.79 | 31: iteration 325700/ 476837 | consumed samples: 83379200 | consumed tokens: 170760601600 | elapsed time per iteration (s): 0.68 | learning rate: 6.181E-05 | global batch size: 256 | lm loss: 2.453800E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.161 | TFLOPs: 22.76 | 31: iteration 325800/ 476837 | consumed samples: 83404800 | consumed tokens: 170813030400 | elapsed time per iteration (s): 0.69 | learning rate: 6.176E-05 | global batch size: 256 | lm loss: 2.456160E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.677 | TFLOPs: 22.61 | 31: iteration 325900/ 476837 | consumed samples: 83430400 | consumed tokens: 170865459200 | elapsed time per iteration (s): 0.68 | learning rate: 6.171E-05 | global batch size: 256 | lm loss: 2.452407E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.485 | TFLOPs: 22.78 | 0: [2023-04-28 11:38:57,914] [INFO] [logging.py:68:log_dist] [Rank 0] step=326000, skipped=0, lr=[6.16619332689164e-05, 6.16619332689164e-05, 6.16619332689164e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 326000/ 476837 | consumed samples: 83456000 | consumed tokens: 170917888000 | elapsed time per iteration (s): 0.68 | learning rate: 6.166E-05 | global batch size: 256 | lm loss: 2.458854E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.553 | TFLOPs: 22.72 | 0: steps: 326000 loss: 2.4315 iter time (s): 0.678 samples/sec: 377.566 31: iteration 326100/ 476837 | consumed samples: 83481600 | consumed tokens: 170970316800 | elapsed time per iteration (s): 0.68 | learning rate: 6.161E-05 | global batch size: 256 | lm loss: 2.457680E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.039 | TFLOPs: 22.75 | 31: iteration 326200/ 476837 | consumed samples: 83507200 | consumed tokens: 171022745600 | elapsed time per iteration (s): 0.68 | learning rate: 6.156E-05 | global batch size: 256 | lm loss: 2.456484E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.598 | TFLOPs: 22.78 | 31: iteration 326300/ 476837 | consumed samples: 83532800 | consumed tokens: 171075174400 | elapsed time per iteration (s): 0.68 | learning rate: 6.151E-05 | global batch size: 256 | lm loss: 2.454485E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.470 | TFLOPs: 22.78 | 31: iteration 326400/ 476837 | consumed samples: 83558400 | consumed tokens: 171127603200 | elapsed time per iteration (s): 0.69 | learning rate: 6.146E-05 | global batch size: 256 | lm loss: 2.457644E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.710 | TFLOPs: 22.55 | 31: iteration 326500/ 476837 | consumed samples: 83584000 | consumed tokens: 171180032000 | elapsed time per iteration (s): 0.91 | learning rate: 6.141E-05 | global batch size: 256 | lm loss: 2.459260E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 280.981 | TFLOPs: 17.00 | 31: iteration 326600/ 476837 | consumed samples: 83609600 | consumed tokens: 171232460800 | elapsed time per iteration (s): 0.69 | learning rate: 6.136E-05 | global batch size: 256 | lm loss: 2.461149E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.560 | TFLOPs: 22.42 | 31: iteration 326700/ 476837 | consumed samples: 83635200 | consumed tokens: 171284889600 | elapsed time per iteration (s): 0.68 | learning rate: 6.131E-05 | global batch size: 256 | lm loss: 2.454763E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.883 | TFLOPs: 22.68 | 31: iteration 326800/ 476837 | consumed samples: 83660800 | consumed tokens: 171337318400 | elapsed time per iteration (s): 0.68 | learning rate: 6.126E-05 | global batch size: 256 | lm loss: 2.457708E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.926 | TFLOPs: 22.68 | 31: iteration 326900/ 476837 | consumed samples: 83686400 | consumed tokens: 171389747200 | elapsed time per iteration (s): 0.68 | learning rate: 6.121E-05 | global batch size: 256 | lm loss: 2.461367E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.634 | TFLOPs: 22.72 | 31: iteration 327000/ 476837 | consumed samples: 83712000 | consumed tokens: 171442176000 | elapsed time per iteration (s): 0.68 | learning rate: 6.116E-05 | global batch size: 256 | lm loss: 2.454123E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.159 | TFLOPs: 22.64 | 31: iteration 327100/ 476837 | consumed samples: 83737600 | consumed tokens: 171494604800 | elapsed time per iteration (s): 0.68 | learning rate: 6.111E-05 | global batch size: 256 | lm loss: 2.457863E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.536 | TFLOPs: 22.72 | 31: iteration 327200/ 476837 | consumed samples: 83763200 | consumed tokens: 171547033600 | elapsed time per iteration (s): 0.68 | learning rate: 6.106E-05 | global batch size: 256 | lm loss: 2.455343E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.227 | TFLOPs: 22.76 | 31: iteration 327300/ 476837 | consumed samples: 83788800 | consumed tokens: 171599462400 | elapsed time per iteration (s): 0.68 | learning rate: 6.101E-05 | global batch size: 256 | lm loss: 2.460062E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.602 | TFLOPs: 22.72 | 31: iteration 327400/ 476837 | consumed samples: 83814400 | consumed tokens: 171651891200 | elapsed time per iteration (s): 0.68 | learning rate: 6.096E-05 | global batch size: 256 | lm loss: 2.456478E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.223 | TFLOPs: 22.76 | 31: iteration 327500/ 476837 | consumed samples: 83840000 | consumed tokens: 171704320000 | elapsed time per iteration (s): 0.69 | learning rate: 6.091E-05 | global batch size: 256 | lm loss: 2.454281E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.445 | TFLOPs: 22.47 | 31: iteration 327600/ 476837 | consumed samples: 83865600 | consumed tokens: 171756748800 | elapsed time per iteration (s): 0.68 | learning rate: 6.086E-05 | global batch size: 256 | lm loss: 2.456750E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.333 | TFLOPs: 22.77 | 31: iteration 327700/ 476837 | consumed samples: 83891200 | consumed tokens: 171809177600 | elapsed time per iteration (s): 0.68 | learning rate: 6.081E-05 | global batch size: 256 | lm loss: 2.455461E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.500 | TFLOPs: 22.78 | 31: iteration 327800/ 476837 | consumed samples: 83916800 | consumed tokens: 171861606400 | elapsed time per iteration (s): 0.68 | learning rate: 6.076E-05 | global batch size: 256 | lm loss: 2.454679E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.893 | TFLOPs: 22.74 | 31: iteration 327900/ 476837 | consumed samples: 83942400 | consumed tokens: 171914035200 | elapsed time per iteration (s): 0.68 | learning rate: 6.071E-05 | global batch size: 256 | lm loss: 2.455502E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.372 | TFLOPs: 22.77 | 0: [2023-04-28 12:02:05,506] [INFO] [logging.py:68:log_dist] [Rank 0] step=328000, skipped=0, lr=[6.065579393500332e-05, 6.065579393500332e-05, 6.065579393500332e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 328000/ 476837 | consumed samples: 83968000 | consumed tokens: 171966464000 | elapsed time per iteration (s): 0.68 | learning rate: 6.066E-05 | global batch size: 256 | lm loss: 2.458242E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.249 | TFLOPs: 22.76 | 0: steps: 328000 loss: 2.4709 iter time (s): 0.691 samples/sec: 370.510 31: iteration 328100/ 476837 | consumed samples: 83993600 | consumed tokens: 172018892800 | elapsed time per iteration (s): 0.68 | learning rate: 6.061E-05 | global batch size: 256 | lm loss: 2.457368E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.169 | TFLOPs: 22.76 | 31: iteration 328200/ 476837 | consumed samples: 84019200 | consumed tokens: 172071321600 | elapsed time per iteration (s): 0.68 | learning rate: 6.056E-05 | global batch size: 256 | lm loss: 2.456464E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.325 | TFLOPs: 22.77 | 31: iteration 328300/ 476837 | consumed samples: 84044800 | consumed tokens: 172123750400 | elapsed time per iteration (s): 0.69 | learning rate: 6.051E-05 | global batch size: 256 | lm loss: 2.451131E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.200 | TFLOPs: 22.58 | 31: iteration 328400/ 476837 | consumed samples: 84070400 | consumed tokens: 172176179200 | elapsed time per iteration (s): 0.68 | learning rate: 6.046E-05 | global batch size: 256 | lm loss: 2.453052E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.067 | TFLOPs: 22.69 | 31: iteration 328500/ 476837 | consumed samples: 84096000 | consumed tokens: 172228608000 | elapsed time per iteration (s): 0.68 | learning rate: 6.041E-05 | global batch size: 256 | lm loss: 2.453930E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.131 | TFLOPs: 22.75 | 31: iteration 328600/ 476837 | consumed samples: 84121600 | consumed tokens: 172281036800 | elapsed time per iteration (s): 0.68 | learning rate: 6.036E-05 | global batch size: 256 | lm loss: 2.452545E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.055 | TFLOPs: 22.75 | 31: iteration 328700/ 476837 | consumed samples: 84147200 | consumed tokens: 172333465600 | elapsed time per iteration (s): 0.68 | learning rate: 6.031E-05 | global batch size: 256 | lm loss: 2.454729E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.778 | TFLOPs: 22.79 | 31: iteration 328800/ 476837 | consumed samples: 84172800 | consumed tokens: 172385894400 | elapsed time per iteration (s): 0.68 | learning rate: 6.026E-05 | global batch size: 256 | lm loss: 2.450204E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.731 | TFLOPs: 22.79 | 31: iteration 328900/ 476837 | consumed samples: 84198400 | consumed tokens: 172438323200 | elapsed time per iteration (s): 0.68 | learning rate: 6.021E-05 | global batch size: 256 | lm loss: 2.456456E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.367 | TFLOPs: 22.65 | 31: iteration 329000/ 476837 | consumed samples: 84224000 | consumed tokens: 172490752000 | elapsed time per iteration (s): 0.68 | learning rate: 6.016E-05 | global batch size: 256 | lm loss: 2.457142E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.980 | TFLOPs: 22.75 | 31: iteration 329100/ 476837 | consumed samples: 84249600 | consumed tokens: 172543180800 | elapsed time per iteration (s): 0.68 | learning rate: 6.011E-05 | global batch size: 256 | lm loss: 2.454680E+00 | grad norm: 0.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.694 | TFLOPs: 22.79 | 31: iteration 329200/ 476837 | consumed samples: 84275200 | consumed tokens: 172595609600 | elapsed time per iteration (s): 0.68 | learning rate: 6.006E-05 | global batch size: 256 | lm loss: 2.455843E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.758 | TFLOPs: 22.79 | 31: iteration 329300/ 476837 | consumed samples: 84300800 | consumed tokens: 172648038400 | elapsed time per iteration (s): 0.68 | learning rate: 6.001E-05 | global batch size: 256 | lm loss: 2.456290E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.829 | TFLOPs: 22.80 | 31: iteration 329400/ 476837 | consumed samples: 84326400 | consumed tokens: 172700467200 | elapsed time per iteration (s): 0.68 | learning rate: 5.996E-05 | global batch size: 256 | lm loss: 2.459049E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.724 | TFLOPs: 22.79 | 31: iteration 329500/ 476837 | consumed samples: 84352000 | consumed tokens: 172752896000 | elapsed time per iteration (s): 0.68 | learning rate: 5.991E-05 | global batch size: 256 | lm loss: 2.449061E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.420 | TFLOPs: 22.77 | 31: iteration 329600/ 476837 | consumed samples: 84377600 | consumed tokens: 172805324800 | elapsed time per iteration (s): 0.68 | learning rate: 5.986E-05 | global batch size: 256 | lm loss: 2.452410E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.884 | TFLOPs: 22.68 | 31: iteration 329700/ 476837 | consumed samples: 84403200 | consumed tokens: 172857753600 | elapsed time per iteration (s): 0.87 | learning rate: 5.981E-05 | global batch size: 256 | lm loss: 2.452826E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 295.029 | TFLOPs: 17.85 | 31: iteration 329800/ 476837 | consumed samples: 84428800 | consumed tokens: 172910182400 | elapsed time per iteration (s): 0.75 | learning rate: 5.976E-05 | global batch size: 256 | lm loss: 2.452231E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 343.078 | TFLOPs: 20.76 | 31: iteration 329900/ 476837 | consumed samples: 84454400 | consumed tokens: 172962611200 | elapsed time per iteration (s): 0.68 | learning rate: 5.971E-05 | global batch size: 256 | lm loss: 2.456985E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.305 | TFLOPs: 22.77 | 0: [2023-04-28 12:25:12,528] [INFO] [logging.py:68:log_dist] [Rank 0] step=330000, skipped=0, lr=[5.965839595925496e-05, 5.965839595925496e-05, 5.965839595925496e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 330000/ 476837 | consumed samples: 84480000 | consumed tokens: 173015040000 | elapsed time per iteration (s): 0.68 | learning rate: 5.966E-05 | global batch size: 256 | lm loss: 2.449736E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.070 | TFLOPs: 22.75 | 0: steps: 330000 loss: 2.4905 iter time (s): 0.691 samples/sec: 370.609 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 330000 | lm loss value: 2.951407E+00 | lm loss PPL: 1.913286E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 330100/ 476837 | consumed samples: 84505600 | consumed tokens: 173067468800 | elapsed time per iteration (s): 0.68 | learning rate: 5.961E-05 | global batch size: 256 | lm loss: 2.455318E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.858 | TFLOPs: 22.74 | 31: iteration 330200/ 476837 | consumed samples: 84531200 | consumed tokens: 173119897600 | elapsed time per iteration (s): 0.68 | learning rate: 5.956E-05 | global batch size: 256 | lm loss: 2.456456E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.478 | TFLOPs: 22.78 | 31: iteration 330300/ 476837 | consumed samples: 84556800 | consumed tokens: 173172326400 | elapsed time per iteration (s): 0.68 | learning rate: 5.951E-05 | global batch size: 256 | lm loss: 2.458192E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.984 | TFLOPs: 22.81 | 31: iteration 330400/ 476837 | consumed samples: 84582400 | consumed tokens: 173224755200 | elapsed time per iteration (s): 0.68 | learning rate: 5.946E-05 | global batch size: 256 | lm loss: 2.449917E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.614 | TFLOPs: 22.78 | 31: iteration 330500/ 476837 | consumed samples: 84608000 | consumed tokens: 173277184000 | elapsed time per iteration (s): 0.68 | learning rate: 5.941E-05 | global batch size: 256 | lm loss: 2.448598E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.247 | TFLOPs: 22.76 | 31: iteration 330600/ 476837 | consumed samples: 84633600 | consumed tokens: 173329612800 | elapsed time per iteration (s): 0.68 | learning rate: 5.936E-05 | global batch size: 256 | lm loss: 2.455794E+00 | grad norm: 0.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.616 | TFLOPs: 22.78 | 31: iteration 330700/ 476837 | consumed samples: 84659200 | consumed tokens: 173382041600 | elapsed time per iteration (s): 0.68 | learning rate: 5.931E-05 | global batch size: 256 | lm loss: 2.455719E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.971 | TFLOPs: 22.81 | 31: iteration 330800/ 476837 | consumed samples: 84684800 | consumed tokens: 173434470400 | elapsed time per iteration (s): 0.68 | learning rate: 5.926E-05 | global batch size: 256 | lm loss: 2.454801E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.998 | TFLOPs: 22.81 | 31: iteration 330900/ 476837 | consumed samples: 84710400 | consumed tokens: 173486899200 | elapsed time per iteration (s): 0.68 | learning rate: 5.921E-05 | global batch size: 256 | lm loss: 2.455960E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.906 | TFLOPs: 22.68 | 31: iteration 331000/ 476837 | consumed samples: 84736000 | consumed tokens: 173539328000 | elapsed time per iteration (s): 0.68 | learning rate: 5.916E-05 | global batch size: 256 | lm loss: 2.450139E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.367 | TFLOPs: 22.77 | 31: iteration 331100/ 476837 | consumed samples: 84761600 | consumed tokens: 173591756800 | elapsed time per iteration (s): 0.68 | learning rate: 5.911E-05 | global batch size: 256 | lm loss: 2.453941E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.719 | TFLOPs: 22.79 | 31: iteration 331200/ 476837 | consumed samples: 84787200 | consumed tokens: 173644185600 | elapsed time per iteration (s): 0.68 | learning rate: 5.906E-05 | global batch size: 256 | lm loss: 2.456447E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.154 | TFLOPs: 22.82 | 31: iteration 331300/ 476837 | consumed samples: 84812800 | consumed tokens: 173696614400 | elapsed time per iteration (s): 0.68 | learning rate: 5.901E-05 | global batch size: 256 | lm loss: 2.452165E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.140 | TFLOPs: 22.82 | 31: iteration 331400/ 476837 | consumed samples: 84838400 | consumed tokens: 173749043200 | elapsed time per iteration (s): 0.68 | learning rate: 5.897E-05 | global batch size: 256 | lm loss: 2.451966E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.911 | TFLOPs: 22.74 | 31: iteration 331500/ 476837 | consumed samples: 84864000 | consumed tokens: 173801472000 | elapsed time per iteration (s): 0.68 | learning rate: 5.892E-05 | global batch size: 256 | lm loss: 2.451625E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.684 | TFLOPs: 22.73 | 31: iteration 331600/ 476837 | consumed samples: 84889600 | consumed tokens: 173853900800 | elapsed time per iteration (s): 0.68 | learning rate: 5.887E-05 | global batch size: 256 | lm loss: 2.451000E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.067 | TFLOPs: 22.81 | 31: iteration 331700/ 476837 | consumed samples: 84915200 | consumed tokens: 173906329600 | elapsed time per iteration (s): 0.68 | learning rate: 5.882E-05 | global batch size: 256 | lm loss: 2.452888E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.042 | TFLOPs: 22.81 | 31: iteration 331800/ 476837 | consumed samples: 84940800 | consumed tokens: 173958758400 | elapsed time per iteration (s): 0.68 | learning rate: 5.877E-05 | global batch size: 256 | lm loss: 2.450045E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.075 | TFLOPs: 22.81 | 31: iteration 331900/ 476837 | consumed samples: 84966400 | consumed tokens: 174011187200 | elapsed time per iteration (s): 0.68 | learning rate: 5.872E-05 | global batch size: 256 | lm loss: 2.454580E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.011 | TFLOPs: 22.81 | 0: [2023-04-28 12:47:52,104] [INFO] [logging.py:68:log_dist] [Rank 0] step=332000, skipped=0, lr=[5.8669916031370796e-05, 5.8669916031370796e-05, 5.8669916031370796e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 332000/ 476837 | consumed samples: 84992000 | consumed tokens: 174063616000 | elapsed time per iteration (s): 0.68 | learning rate: 5.867E-05 | global batch size: 256 | lm loss: 2.455101E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.944 | TFLOPs: 22.80 | 0: steps: 332000 loss: 2.4206 iter time (s): 0.676 samples/sec: 378.426 31: iteration 332100/ 476837 | consumed samples: 85017600 | consumed tokens: 174116044800 | elapsed time per iteration (s): 0.68 | learning rate: 5.862E-05 | global batch size: 256 | lm loss: 2.450772E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.927 | TFLOPs: 22.68 | 31: iteration 332200/ 476837 | consumed samples: 85043200 | consumed tokens: 174168473600 | elapsed time per iteration (s): 0.68 | learning rate: 5.857E-05 | global batch size: 256 | lm loss: 2.452040E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.549 | TFLOPs: 22.78 | 31: iteration 332300/ 476837 | consumed samples: 85068800 | consumed tokens: 174220902400 | elapsed time per iteration (s): 0.68 | learning rate: 5.852E-05 | global batch size: 256 | lm loss: 2.449642E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.646 | TFLOPs: 22.73 | 31: iteration 332400/ 476837 | consumed samples: 85094400 | consumed tokens: 174273331200 | elapsed time per iteration (s): 0.68 | learning rate: 5.847E-05 | global batch size: 256 | lm loss: 2.452887E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.830 | TFLOPs: 22.74 | 31: iteration 332500/ 476837 | consumed samples: 85120000 | consumed tokens: 174325760000 | elapsed time per iteration (s): 0.68 | learning rate: 5.842E-05 | global batch size: 256 | lm loss: 2.454499E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.862 | TFLOPs: 22.68 | 31: iteration 332600/ 476837 | consumed samples: 85145600 | consumed tokens: 174378188800 | elapsed time per iteration (s): 0.68 | learning rate: 5.838E-05 | global batch size: 256 | lm loss: 2.451994E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.107 | TFLOPs: 22.75 | 31: iteration 332700/ 476837 | consumed samples: 85171200 | consumed tokens: 174430617600 | elapsed time per iteration (s): 0.68 | learning rate: 5.833E-05 | global batch size: 256 | lm loss: 2.450379E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.179 | TFLOPs: 22.64 | 31: iteration 332800/ 476837 | consumed samples: 85196800 | consumed tokens: 174483046400 | elapsed time per iteration (s): 0.68 | learning rate: 5.828E-05 | global batch size: 256 | lm loss: 2.448660E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.723 | TFLOPs: 22.73 | 31: iteration 332900/ 476837 | consumed samples: 85222400 | consumed tokens: 174535475200 | elapsed time per iteration (s): 0.72 | learning rate: 5.823E-05 | global batch size: 256 | lm loss: 2.453111E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 355.077 | TFLOPs: 21.48 | 31: iteration 333000/ 476837 | consumed samples: 85248000 | consumed tokens: 174587904000 | elapsed time per iteration (s): 0.91 | learning rate: 5.818E-05 | global batch size: 256 | lm loss: 2.457053E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 281.535 | TFLOPs: 17.03 | 31: iteration 333100/ 476837 | consumed samples: 85273600 | consumed tokens: 174640332800 | elapsed time per iteration (s): 0.69 | learning rate: 5.813E-05 | global batch size: 256 | lm loss: 2.451639E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.284 | TFLOPs: 22.58 | 31: iteration 333200/ 476837 | consumed samples: 85299200 | consumed tokens: 174692761600 | elapsed time per iteration (s): 0.68 | learning rate: 5.808E-05 | global batch size: 256 | lm loss: 2.452099E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.386 | TFLOPs: 22.77 | 31: iteration 333300/ 476837 | consumed samples: 85324800 | consumed tokens: 174745190400 | elapsed time per iteration (s): 0.68 | learning rate: 5.803E-05 | global batch size: 256 | lm loss: 2.451620E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.311 | TFLOPs: 22.77 | 31: iteration 333400/ 476837 | consumed samples: 85350400 | consumed tokens: 174797619200 | elapsed time per iteration (s): 0.68 | learning rate: 5.798E-05 | global batch size: 256 | lm loss: 2.454016E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.208 | TFLOPs: 22.76 | 31: iteration 333500/ 476837 | consumed samples: 85376000 | consumed tokens: 174850048000 | elapsed time per iteration (s): 0.68 | learning rate: 5.793E-05 | global batch size: 256 | lm loss: 2.448159E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.405 | TFLOPs: 22.77 | 31: iteration 333600/ 476837 | consumed samples: 85401600 | consumed tokens: 174902476800 | elapsed time per iteration (s): 0.68 | learning rate: 5.789E-05 | global batch size: 256 | lm loss: 2.448018E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.817 | TFLOPs: 22.80 | 31: iteration 333700/ 476837 | consumed samples: 85427200 | consumed tokens: 174954905600 | elapsed time per iteration (s): 0.68 | learning rate: 5.784E-05 | global batch size: 256 | lm loss: 2.449605E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.787 | TFLOPs: 22.79 | 31: iteration 333800/ 476837 | consumed samples: 85452800 | consumed tokens: 175007334400 | elapsed time per iteration (s): 0.68 | learning rate: 5.779E-05 | global batch size: 256 | lm loss: 2.448504E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.009 | TFLOPs: 22.81 | 31: iteration 333900/ 476837 | consumed samples: 85478400 | consumed tokens: 175059763200 | elapsed time per iteration (s): 0.73 | learning rate: 5.774E-05 | global batch size: 256 | lm loss: 2.451174E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 352.743 | TFLOPs: 21.34 | 0: [2023-04-28 13:11:06,772] [INFO] [logging.py:68:log_dist] [Rank 0] step=334000, skipped=0, lr=[5.7690529261212274e-05, 5.7690529261212274e-05, 5.7690529261212274e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 334000/ 476837 | consumed samples: 85504000 | consumed tokens: 175112192000 | elapsed time per iteration (s): 0.69 | learning rate: 5.769E-05 | global batch size: 256 | lm loss: 2.452637E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.159 | TFLOPs: 22.39 | 0: steps: 334000 loss: 2.4630 iter time (s): 0.694 samples/sec: 368.848 31: iteration 334100/ 476837 | consumed samples: 85529600 | consumed tokens: 175164620800 | elapsed time per iteration (s): 0.68 | learning rate: 5.764E-05 | global batch size: 256 | lm loss: 2.450383E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.655 | TFLOPs: 22.79 | 31: iteration 334200/ 476837 | consumed samples: 85555200 | consumed tokens: 175217049600 | elapsed time per iteration (s): 0.68 | learning rate: 5.759E-05 | global batch size: 256 | lm loss: 2.449648E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.738 | TFLOPs: 22.73 | 31: iteration 334300/ 476837 | consumed samples: 85580800 | consumed tokens: 175269478400 | elapsed time per iteration (s): 0.68 | learning rate: 5.754E-05 | global batch size: 256 | lm loss: 2.451562E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.825 | TFLOPs: 22.80 | 31: iteration 334400/ 476837 | consumed samples: 85606400 | consumed tokens: 175321907200 | elapsed time per iteration (s): 0.68 | learning rate: 5.750E-05 | global batch size: 256 | lm loss: 2.445582E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.836 | TFLOPs: 22.80 | 31: iteration 334500/ 476837 | consumed samples: 85632000 | consumed tokens: 175374336000 | elapsed time per iteration (s): 0.68 | learning rate: 5.745E-05 | global batch size: 256 | lm loss: 2.448562E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.145 | TFLOPs: 22.76 | 31: iteration 334600/ 476837 | consumed samples: 85657600 | consumed tokens: 175426764800 | elapsed time per iteration (s): 0.68 | learning rate: 5.740E-05 | global batch size: 256 | lm loss: 2.445988E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.333 | TFLOPs: 22.77 | 31: iteration 334700/ 476837 | consumed samples: 85683200 | consumed tokens: 175479193600 | elapsed time per iteration (s): 0.68 | learning rate: 5.735E-05 | global batch size: 256 | lm loss: 2.447143E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.455 | TFLOPs: 22.77 | 31: iteration 334800/ 476837 | consumed samples: 85708800 | consumed tokens: 175531622400 | elapsed time per iteration (s): 0.68 | learning rate: 5.730E-05 | global batch size: 256 | lm loss: 2.449916E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.003 | TFLOPs: 22.81 | 31: iteration 334900/ 476837 | consumed samples: 85734400 | consumed tokens: 175584051200 | elapsed time per iteration (s): 0.68 | learning rate: 5.725E-05 | global batch size: 256 | lm loss: 2.449594E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.558 | TFLOPs: 22.78 | 31: iteration 335000/ 476837 | consumed samples: 85760000 | consumed tokens: 175636480000 | elapsed time per iteration (s): 0.68 | learning rate: 5.720E-05 | global batch size: 256 | lm loss: 2.450357E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.877 | TFLOPs: 22.80 | 31: iteration 335100/ 476837 | consumed samples: 85785600 | consumed tokens: 175688908800 | elapsed time per iteration (s): 0.68 | learning rate: 5.716E-05 | global batch size: 256 | lm loss: 2.449239E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.949 | TFLOPs: 22.80 | 31: iteration 335200/ 476837 | consumed samples: 85811200 | consumed tokens: 175741337600 | elapsed time per iteration (s): 0.68 | learning rate: 5.711E-05 | global batch size: 256 | lm loss: 2.448223E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.658 | TFLOPs: 22.79 | 31: iteration 335300/ 476837 | consumed samples: 85836800 | consumed tokens: 175793766400 | elapsed time per iteration (s): 0.68 | learning rate: 5.706E-05 | global batch size: 256 | lm loss: 2.444342E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.349 | TFLOPs: 22.77 | 31: iteration 335400/ 476837 | consumed samples: 85862400 | consumed tokens: 175846195200 | elapsed time per iteration (s): 0.68 | learning rate: 5.701E-05 | global batch size: 256 | lm loss: 2.457773E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.341 | TFLOPs: 22.65 | 31: iteration 335500/ 476837 | consumed samples: 85888000 | consumed tokens: 175898624000 | elapsed time per iteration (s): 0.68 | learning rate: 5.696E-05 | global batch size: 256 | lm loss: 2.447451E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.802 | TFLOPs: 22.80 | 31: iteration 335600/ 476837 | consumed samples: 85913600 | consumed tokens: 175951052800 | elapsed time per iteration (s): 0.68 | learning rate: 5.691E-05 | global batch size: 256 | lm loss: 2.448857E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.439 | TFLOPs: 22.77 | 31: iteration 335700/ 476837 | consumed samples: 85939200 | consumed tokens: 176003481600 | elapsed time per iteration (s): 0.68 | learning rate: 5.687E-05 | global batch size: 256 | lm loss: 2.447361E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.331 | TFLOPs: 22.65 | 31: iteration 335800/ 476837 | consumed samples: 85964800 | consumed tokens: 176055910400 | elapsed time per iteration (s): 0.68 | learning rate: 5.682E-05 | global batch size: 256 | lm loss: 2.450833E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.822 | TFLOPs: 22.80 | 31: iteration 335900/ 476837 | consumed samples: 85990400 | consumed tokens: 176108339200 | elapsed time per iteration (s): 0.69 | learning rate: 5.677E-05 | global batch size: 256 | lm loss: 2.444302E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.755 | TFLOPs: 22.43 | 0: [2023-04-28 13:33:48,855] [INFO] [logging.py:68:log_dist] [Rank 0] step=336000, skipped=0, lr=[5.672040914778207e-05, 5.672040914778207e-05, 5.672040914778207e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 336000/ 476837 | consumed samples: 86016000 | consumed tokens: 176160768000 | elapsed time per iteration (s): 0.69 | learning rate: 5.672E-05 | global batch size: 256 | lm loss: 2.448734E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.161 | TFLOPs: 22.58 | 0: steps: 336000 loss: 2.4768 iter time (s): 0.677 samples/sec: 378.158 31: iteration 336100/ 476837 | consumed samples: 86041600 | consumed tokens: 176213196800 | elapsed time per iteration (s): 0.68 | learning rate: 5.667E-05 | global batch size: 256 | lm loss: 2.448092E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.495 | TFLOPs: 22.78 | 31: iteration 336200/ 476837 | consumed samples: 86067200 | consumed tokens: 176265625600 | elapsed time per iteration (s): 0.75 | learning rate: 5.662E-05 | global batch size: 256 | lm loss: 2.448843E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 342.339 | TFLOPs: 20.71 | 31: iteration 336300/ 476837 | consumed samples: 86092800 | consumed tokens: 176318054400 | elapsed time per iteration (s): 0.89 | learning rate: 5.658E-05 | global batch size: 256 | lm loss: 2.446880E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 288.379 | TFLOPs: 17.45 | 31: iteration 336400/ 476837 | consumed samples: 86118400 | consumed tokens: 176370483200 | elapsed time per iteration (s): 0.68 | learning rate: 5.653E-05 | global batch size: 256 | lm loss: 2.448139E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.759 | TFLOPs: 22.79 | 31: iteration 336500/ 476837 | consumed samples: 86144000 | consumed tokens: 176422912000 | elapsed time per iteration (s): 0.68 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 2.452081E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.620 | TFLOPs: 22.72 | 31: iteration 336600/ 476837 | consumed samples: 86169600 | consumed tokens: 176475340800 | elapsed time per iteration (s): 0.68 | learning rate: 5.643E-05 | global batch size: 256 | lm loss: 2.448810E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.489 | TFLOPs: 22.78 | 31: iteration 336700/ 476837 | consumed samples: 86195200 | consumed tokens: 176527769600 | elapsed time per iteration (s): 0.68 | learning rate: 5.638E-05 | global batch size: 256 | lm loss: 2.447119E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.076 | TFLOPs: 22.75 | 31: iteration 336800/ 476837 | consumed samples: 86220800 | consumed tokens: 176580198400 | elapsed time per iteration (s): 0.68 | learning rate: 5.633E-05 | global batch size: 256 | lm loss: 2.448373E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.476 | TFLOPs: 22.78 | 31: iteration 336900/ 476837 | consumed samples: 86246400 | consumed tokens: 176632627200 | elapsed time per iteration (s): 0.68 | learning rate: 5.629E-05 | global batch size: 256 | lm loss: 2.446870E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.359 | TFLOPs: 22.77 | 31: iteration 337000/ 476837 | consumed samples: 86272000 | consumed tokens: 176685056000 | elapsed time per iteration (s): 0.68 | learning rate: 5.624E-05 | global batch size: 256 | lm loss: 2.447276E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.346 | TFLOPs: 22.71 | 31: iteration 337100/ 476837 | consumed samples: 86297600 | consumed tokens: 176737484800 | elapsed time per iteration (s): 0.68 | learning rate: 5.619E-05 | global batch size: 256 | lm loss: 2.447614E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.136 | TFLOPs: 22.69 | 31: iteration 337200/ 476837 | consumed samples: 86323200 | consumed tokens: 176789913600 | elapsed time per iteration (s): 0.68 | learning rate: 5.614E-05 | global batch size: 256 | lm loss: 2.450134E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.420 | TFLOPs: 22.77 | 31: iteration 337300/ 476837 | consumed samples: 86348800 | consumed tokens: 176842342400 | elapsed time per iteration (s): 0.70 | learning rate: 5.609E-05 | global batch size: 256 | lm loss: 2.444829E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.494 | TFLOPs: 22.17 | 31: iteration 337400/ 476837 | consumed samples: 86374400 | consumed tokens: 176894771200 | elapsed time per iteration (s): 0.68 | learning rate: 5.605E-05 | global batch size: 256 | lm loss: 2.451161E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.327 | TFLOPs: 22.77 | 31: iteration 337500/ 476837 | consumed samples: 86400000 | consumed tokens: 176947200000 | elapsed time per iteration (s): 0.68 | learning rate: 5.600E-05 | global batch size: 256 | lm loss: 2.446723E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.705 | TFLOPs: 22.79 | 31: iteration 337600/ 476837 | consumed samples: 86425600 | consumed tokens: 176999628800 | elapsed time per iteration (s): 0.70 | learning rate: 5.595E-05 | global batch size: 256 | lm loss: 2.450990E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.521 | TFLOPs: 22.17 | 31: iteration 337700/ 476837 | consumed samples: 86451200 | consumed tokens: 177052057600 | elapsed time per iteration (s): 0.68 | learning rate: 5.590E-05 | global batch size: 256 | lm loss: 2.449539E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.159 | TFLOPs: 22.76 | 31: iteration 337800/ 476837 | consumed samples: 86476800 | consumed tokens: 177104486400 | elapsed time per iteration (s): 0.68 | learning rate: 5.586E-05 | global batch size: 256 | lm loss: 2.447936E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.810 | TFLOPs: 22.74 | 31: iteration 337900/ 476837 | consumed samples: 86502400 | consumed tokens: 177156915200 | elapsed time per iteration (s): 0.68 | learning rate: 5.581E-05 | global batch size: 256 | lm loss: 2.445667E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.272 | TFLOPs: 22.76 | 0: [2023-04-28 13:57:01,240] [INFO] [logging.py:68:log_dist] [Rank 0] step=338000, skipped=0, lr=[5.57597275484886e-05, 5.57597275484886e-05, 5.57597275484886e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 338000/ 476837 | consumed samples: 86528000 | consumed tokens: 177209344000 | elapsed time per iteration (s): 0.68 | learning rate: 5.576E-05 | global batch size: 256 | lm loss: 2.445101E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.809 | TFLOPs: 22.68 | 0: steps: 338000 loss: 2.4269 iter time (s): 0.692 samples/sec: 369.741 31: iteration 338100/ 476837 | consumed samples: 86553600 | consumed tokens: 177261772800 | elapsed time per iteration (s): 0.69 | learning rate: 5.571E-05 | global batch size: 256 | lm loss: 2.450130E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.683 | TFLOPs: 22.61 | 31: iteration 338200/ 476837 | consumed samples: 86579200 | consumed tokens: 177314201600 | elapsed time per iteration (s): 0.68 | learning rate: 5.566E-05 | global batch size: 256 | lm loss: 2.445745E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.977 | TFLOPs: 22.75 | 31: iteration 338300/ 476837 | consumed samples: 86604800 | consumed tokens: 177366630400 | elapsed time per iteration (s): 0.69 | learning rate: 5.562E-05 | global batch size: 256 | lm loss: 2.446450E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.511 | TFLOPs: 22.54 | 31: iteration 338400/ 476837 | consumed samples: 86630400 | consumed tokens: 177419059200 | elapsed time per iteration (s): 0.68 | learning rate: 5.557E-05 | global batch size: 256 | lm loss: 2.444525E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.303 | TFLOPs: 22.70 | 31: iteration 338500/ 476837 | consumed samples: 86656000 | consumed tokens: 177471488000 | elapsed time per iteration (s): 0.68 | learning rate: 5.552E-05 | global batch size: 256 | lm loss: 2.444709E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.921 | TFLOPs: 22.74 | 31: iteration 338600/ 476837 | consumed samples: 86681600 | consumed tokens: 177523916800 | elapsed time per iteration (s): 0.69 | learning rate: 5.547E-05 | global batch size: 256 | lm loss: 2.441753E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.043 | TFLOPs: 22.51 | 31: iteration 338700/ 476837 | consumed samples: 86707200 | consumed tokens: 177576345600 | elapsed time per iteration (s): 0.68 | learning rate: 5.543E-05 | global batch size: 256 | lm loss: 2.446293E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.722 | TFLOPs: 22.79 | 31: iteration 338800/ 476837 | consumed samples: 86732800 | consumed tokens: 177628774400 | elapsed time per iteration (s): 0.68 | learning rate: 5.538E-05 | global batch size: 256 | lm loss: 2.444524E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.792 | TFLOPs: 22.73 | 31: iteration 338900/ 476837 | consumed samples: 86758400 | consumed tokens: 177681203200 | elapsed time per iteration (s): 0.68 | learning rate: 5.533E-05 | global batch size: 256 | lm loss: 2.447984E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.628 | TFLOPs: 22.79 | 31: iteration 339000/ 476837 | consumed samples: 86784000 | consumed tokens: 177733632000 | elapsed time per iteration (s): 0.68 | learning rate: 5.528E-05 | global batch size: 256 | lm loss: 2.447521E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.765 | TFLOPs: 22.79 | 31: iteration 339100/ 476837 | consumed samples: 86809600 | consumed tokens: 177786060800 | elapsed time per iteration (s): 0.68 | learning rate: 5.524E-05 | global batch size: 256 | lm loss: 2.445685E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.272 | TFLOPs: 22.76 | 31: iteration 339200/ 476837 | consumed samples: 86835200 | consumed tokens: 177838489600 | elapsed time per iteration (s): 0.68 | learning rate: 5.519E-05 | global batch size: 256 | lm loss: 2.445443E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.775 | TFLOPs: 22.79 | 31: iteration 339300/ 476837 | consumed samples: 86860800 | consumed tokens: 177890918400 | elapsed time per iteration (s): 0.68 | learning rate: 5.514E-05 | global batch size: 256 | lm loss: 2.448820E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.784 | TFLOPs: 22.79 | 31: iteration 339400/ 476837 | consumed samples: 86886400 | consumed tokens: 177943347200 | elapsed time per iteration (s): 0.68 | learning rate: 5.509E-05 | global batch size: 256 | lm loss: 2.446333E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.703 | TFLOPs: 22.79 | 31: iteration 339500/ 476837 | consumed samples: 86912000 | consumed tokens: 177995776000 | elapsed time per iteration (s): 0.69 | learning rate: 5.505E-05 | global batch size: 256 | lm loss: 2.439847E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.503 | TFLOPs: 22.60 | 31: iteration 339600/ 476837 | consumed samples: 86937600 | consumed tokens: 178048204800 | elapsed time per iteration (s): 0.94 | learning rate: 5.500E-05 | global batch size: 256 | lm loss: 2.442636E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 273.492 | TFLOPs: 16.55 | 31: iteration 339700/ 476837 | consumed samples: 86963200 | consumed tokens: 178100633600 | elapsed time per iteration (s): 0.69 | learning rate: 5.495E-05 | global batch size: 256 | lm loss: 2.444025E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.365 | TFLOPs: 22.47 | 31: iteration 339800/ 476837 | consumed samples: 86988800 | consumed tokens: 178153062400 | elapsed time per iteration (s): 0.68 | learning rate: 5.490E-05 | global batch size: 256 | lm loss: 2.442106E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.083 | TFLOPs: 22.75 | 31: iteration 339900/ 476837 | consumed samples: 87014400 | consumed tokens: 178205491200 | elapsed time per iteration (s): 0.68 | learning rate: 5.486E-05 | global batch size: 256 | lm loss: 2.446776E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.503 | TFLOPs: 22.78 | 0: [2023-04-28 14:20:10,654] [INFO] [logging.py:68:log_dist] [Rank 0] step=340000, skipped=0, lr=[5.480865464870133e-05, 5.480865464870133e-05, 5.480865464870133e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 340000/ 476837 | consumed samples: 87040000 | consumed tokens: 178257920000 | elapsed time per iteration (s): 0.68 | learning rate: 5.481E-05 | global batch size: 256 | lm loss: 2.441988E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.738 | TFLOPs: 22.79 | 0: steps: 340000 loss: 2.4409 iter time (s): 0.692 samples/sec: 370.123 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 340000 | lm loss value: 2.962118E+00 | lm loss PPL: 1.933888E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 340000 to checkpoints_1b1250b1b5 0: [2023-04-28 14:20:10,919] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step340000 is begin to save! 0: [2023-04-28 14:20:10,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_01-model_00-model_states.pt... 0: [2023-04-28 14:20:11,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_01-model_00-model_states.pt. 0: [2023-04-28 14:20:11,202] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_03-model_00-model_states.pt... 0: [2023-04-28 14:20:11,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_03-model_00-model_states.pt. 0: [2023-04-28 14:20:11,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_04-model_00-model_states.pt... 0: [2023-04-28 14:20:11,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_04-model_00-model_states.pt. 0: [2023-04-28 14:20:11,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_05-model_00-model_states.pt... 0: [2023-04-28 14:20:11,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_05-model_00-model_states.pt. 0: [2023-04-28 14:20:11,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_06-model_00-model_states.pt... 0: [2023-04-28 14:20:11,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_06-model_00-model_states.pt. 0: [2023-04-28 14:20:11,569] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_07-model_00-model_states.pt... 0: [2023-04-28 14:20:11,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_07-model_00-model_states.pt. 0: [2023-04-28 14:20:11,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_08-model_00-model_states.pt... 0: [2023-04-28 14:20:11,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_08-model_00-model_states.pt. 0: [2023-04-28 14:20:11,750] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_09-model_00-model_states.pt... 0: [2023-04-28 14:20:11,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_09-model_00-model_states.pt. 0: [2023-04-28 14:20:11,846] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_10-model_00-model_states.pt... 0: [2023-04-28 14:20:11,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_10-model_00-model_states.pt. 0: [2023-04-28 14:20:11,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_11-model_00-model_states.pt... 0: [2023-04-28 14:20:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_11-model_00-model_states.pt. 0: [2023-04-28 14:20:12,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_12-model_00-model_states.pt... 0: [2023-04-28 14:20:12,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_12-model_00-model_states.pt. 0: [2023-04-28 14:20:12,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_13-model_00-model_states.pt... 0: [2023-04-28 14:20:12,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_13-model_00-model_states.pt. 0: [2023-04-28 14:20:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_14-model_00-model_states.pt... 0: [2023-04-28 14:20:12,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_14-model_00-model_states.pt. 0: [2023-04-28 14:20:12,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_15-model_00-model_states.pt... 0: [2023-04-28 14:20:12,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_15-model_00-model_states.pt. 0: [2023-04-28 14:20:12,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_16-model_00-model_states.pt... 0: [2023-04-28 14:20:12,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_16-model_00-model_states.pt. 0: [2023-04-28 14:20:12,465] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_17-model_00-model_states.pt... 0: [2023-04-28 14:20:12,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_17-model_00-model_states.pt. 0: [2023-04-28 14:20:12,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_18-model_00-model_states.pt... 0: [2023-04-28 14:20:12,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_18-model_00-model_states.pt. 0: [2023-04-28 14:20:12,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_19-model_00-model_states.pt... 0: [2023-04-28 14:20:12,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_19-model_00-model_states.pt. 0: [2023-04-28 14:20:12,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_20-model_00-model_states.pt... 0: [2023-04-28 14:20:12,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_20-model_00-model_states.pt. 0: [2023-04-28 14:20:12,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_21-model_00-model_states.pt... 0: [2023-04-28 14:20:12,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_21-model_00-model_states.pt. 0: [2023-04-28 14:20:12,908] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_22-model_00-model_states.pt... 0: [2023-04-28 14:20:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_22-model_00-model_states.pt. 0: [2023-04-28 14:20:12,999] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_23-model_00-model_states.pt... 0: [2023-04-28 14:20:13,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_23-model_00-model_states.pt. 0: [2023-04-28 14:20:13,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_24-model_00-model_states.pt... 0: [2023-04-28 14:20:13,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_24-model_00-model_states.pt. 0: [2023-04-28 14:20:13,163] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_25-model_00-model_states.pt... 0: [2023-04-28 14:20:13,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_25-model_00-model_states.pt. 0: [2023-04-28 14:20:13,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_26-model_00-model_states.pt... 0: [2023-04-28 14:20:13,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_26-model_00-model_states.pt. 0: [2023-04-28 14:20:13,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_27-model_00-model_states.pt... 0: [2023-04-28 14:20:13,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_27-model_00-model_states.pt. 0: [2023-04-28 14:20:13,417] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_28-model_00-model_states.pt... 0: [2023-04-28 14:20:13,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_28-model_00-model_states.pt. 0: [2023-04-28 14:20:13,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/layer_30-model_00-model_states.pt... 0: [2023-04-28 14:20:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/layer_30-model_00-model_states.pt. 0: [2023-04-28 14:20:13,510] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step340000/mp_rank_00_model_states.pt 0: [2023-04-28 14:20:13,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/mp_rank_00_model_states.pt... 0: [2023-04-28 14:20:13,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/mp_rank_00_model_states.pt. 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 18: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 21: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 22: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 8: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 15: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 13: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 20: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 24: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 26: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 14: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 27: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 23: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 29: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 28: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 0: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 12: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 31: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 2: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 11: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 16: [2023-04-28 14:20:13,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 10: [2023-04-28 14:20:13,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-28 14:20:13,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: [2023-04-28 14:20:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: [2023-04-28 14:20:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: [2023-04-28 14:20:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: [2023-04-28 14:20:13,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: [2023-04-28 14:20:13,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 23: [2023-04-28 14:20:13,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 10: [2023-04-28 14:20:13,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-28 14:20:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 10: [2023-04-28 14:20:13,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 23: [2023-04-28 14:20:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 10: [2023-04-28 14:20:13,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 23: [2023-04-28 14:20:13,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 11: [2023-04-28 14:20:13,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-28 14:20:13,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-28 14:20:13,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-28 14:20:13,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-28 14:20:13,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-28 14:20:13,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-28 14:20:13,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-28 14:20:13,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 23: [2023-04-28 14:20:13,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 23: [2023-04-28 14:20:13,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 23: [2023-04-28 14:20:13,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 10: [2023-04-28 14:20:13,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-28 14:20:13,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-28 14:20:13,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 10: [2023-04-28 14:20:13,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 29: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 10: [2023-04-28 14:20:13,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-28 14:20:13,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-28 14:20:13,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 13: [2023-04-28 14:20:13,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-28 14:20:13,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 17: [2023-04-28 14:20:13,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 31: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 0: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 24: [2023-04-28 14:20:13,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 30: [2023-04-28 14:20:13,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-28 14:20:13,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 15: [2023-04-28 14:20:13,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-28 14:20:13,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-28 14:20:13,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 12: [2023-04-28 14:20:13,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-04-28 14:20:13,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-28 14:20:13,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-28 14:20:13,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 6: [2023-04-28 14:20:13,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 4: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 21: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 27: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 3: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-28 14:20:13,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 28: [2023-04-28 14:20:13,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 4: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 28: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 4: [2023-04-28 14:20:13,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-28 14:20:13,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 20: [2023-04-28 14:20:13,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 5: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-28 14:20:13,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-28 14:20:13,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 14: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 18: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 14: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 9: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 9: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 9: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 25: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 22: [2023-04-28 14:20:13,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-28 14:20:13,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 7: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 16: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 8: [2023-04-28 14:20:13,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 26: [2023-04-28 14:20:13,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 2: [2023-04-28 14:20:13,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 19: [2023-04-28 14:20:13,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step340000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 1: [2023-04-28 14:20:13,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step340000 is ready now! 0: successfully saved checkpoint at iteration 340000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 2982.74 31: iteration 340100/ 476837 | consumed samples: 87065600 | consumed tokens: 178310348800 | elapsed time per iteration (s): 0.71 | learning rate: 5.476E-05 | global batch size: 256 | lm loss: 2.446183E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.636 | TFLOPs: 21.76 | 31: iteration 340200/ 476837 | consumed samples: 87091200 | consumed tokens: 178362777600 | elapsed time per iteration (s): 0.68 | learning rate: 5.471E-05 | global batch size: 256 | lm loss: 2.445830E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.776 | TFLOPs: 22.73 | 31: iteration 340300/ 476837 | consumed samples: 87116800 | consumed tokens: 178415206400 | elapsed time per iteration (s): 0.68 | learning rate: 5.467E-05 | global batch size: 256 | lm loss: 2.441604E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.255 | TFLOPs: 22.76 | 31: iteration 340400/ 476837 | consumed samples: 87142400 | consumed tokens: 178467635200 | elapsed time per iteration (s): 0.68 | learning rate: 5.462E-05 | global batch size: 256 | lm loss: 2.443970E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.791 | TFLOPs: 22.79 | 31: iteration 340500/ 476837 | consumed samples: 87168000 | consumed tokens: 178520064000 | elapsed time per iteration (s): 0.68 | learning rate: 5.457E-05 | global batch size: 256 | lm loss: 2.442335E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.797 | TFLOPs: 22.80 | 31: iteration 340600/ 476837 | consumed samples: 87193600 | consumed tokens: 178572492800 | elapsed time per iteration (s): 0.68 | learning rate: 5.453E-05 | global batch size: 256 | lm loss: 2.444049E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.942 | TFLOPs: 22.80 | 31: iteration 340700/ 476837 | consumed samples: 87219200 | consumed tokens: 178624921600 | elapsed time per iteration (s): 0.68 | learning rate: 5.448E-05 | global batch size: 256 | lm loss: 2.443508E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.907 | TFLOPs: 22.80 | 31: iteration 340800/ 476837 | consumed samples: 87244800 | consumed tokens: 178677350400 | elapsed time per iteration (s): 0.68 | learning rate: 5.443E-05 | global batch size: 256 | lm loss: 2.441224E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.782 | TFLOPs: 22.79 | 31: iteration 340900/ 476837 | consumed samples: 87270400 | consumed tokens: 178729779200 | elapsed time per iteration (s): 0.68 | learning rate: 5.438E-05 | global batch size: 256 | lm loss: 2.446606E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.939 | TFLOPs: 22.80 | 31: iteration 341000/ 476837 | consumed samples: 87296000 | consumed tokens: 178782208000 | elapsed time per iteration (s): 0.68 | learning rate: 5.434E-05 | global batch size: 256 | lm loss: 2.441669E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.950 | TFLOPs: 22.68 | 31: iteration 341100/ 476837 | consumed samples: 87321600 | consumed tokens: 178834636800 | elapsed time per iteration (s): 0.68 | learning rate: 5.429E-05 | global batch size: 256 | lm loss: 2.447178E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.601 | TFLOPs: 22.72 | 31: iteration 341200/ 476837 | consumed samples: 87347200 | consumed tokens: 178887065600 | elapsed time per iteration (s): 0.68 | learning rate: 5.424E-05 | global batch size: 256 | lm loss: 2.446363E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.576 | TFLOPs: 22.78 | 31: iteration 341300/ 476837 | consumed samples: 87372800 | consumed tokens: 178939494400 | elapsed time per iteration (s): 0.68 | learning rate: 5.420E-05 | global batch size: 256 | lm loss: 2.445764E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.620 | TFLOPs: 22.72 | 31: iteration 341400/ 476837 | consumed samples: 87398400 | consumed tokens: 178991923200 | elapsed time per iteration (s): 0.68 | learning rate: 5.415E-05 | global batch size: 256 | lm loss: 2.442132E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.864 | TFLOPs: 22.80 | 31: iteration 341500/ 476837 | consumed samples: 87424000 | consumed tokens: 179044352000 | elapsed time per iteration (s): 0.68 | learning rate: 5.410E-05 | global batch size: 256 | lm loss: 2.448109E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.665 | TFLOPs: 22.79 | 31: iteration 341600/ 476837 | consumed samples: 87449600 | consumed tokens: 179096780800 | elapsed time per iteration (s): 0.68 | learning rate: 5.405E-05 | global batch size: 256 | lm loss: 2.444982E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.125 | TFLOPs: 22.82 | 31: iteration 341700/ 476837 | consumed samples: 87475200 | consumed tokens: 179149209600 | elapsed time per iteration (s): 0.68 | learning rate: 5.401E-05 | global batch size: 256 | lm loss: 2.444807E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.205 | TFLOPs: 22.82 | 31: iteration 341800/ 476837 | consumed samples: 87500800 | consumed tokens: 179201638400 | elapsed time per iteration (s): 0.68 | learning rate: 5.396E-05 | global batch size: 256 | lm loss: 2.443958E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.833 | TFLOPs: 22.74 | 31: iteration 341900/ 476837 | consumed samples: 87526400 | consumed tokens: 179254067200 | elapsed time per iteration (s): 0.68 | learning rate: 5.391E-05 | global batch size: 256 | lm loss: 2.443542E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.723 | TFLOPs: 22.73 | 0: [2023-04-28 14:42:53,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=342000, skipped=0, lr=[5.3867358931602435e-05, 5.3867358931602435e-05, 5.3867358931602435e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 342000/ 476837 | consumed samples: 87552000 | consumed tokens: 179306496000 | elapsed time per iteration (s): 0.68 | learning rate: 5.387E-05 | global batch size: 256 | lm loss: 2.440169E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.821 | TFLOPs: 22.80 | 0: steps: 342000 loss: 2.4442 iter time (s): 0.678 samples/sec: 377.736 31: iteration 342100/ 476837 | consumed samples: 87577600 | consumed tokens: 179358924800 | elapsed time per iteration (s): 0.68 | learning rate: 5.382E-05 | global batch size: 256 | lm loss: 2.444807E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.805 | TFLOPs: 22.80 | 31: iteration 342200/ 476837 | consumed samples: 87603200 | consumed tokens: 179411353600 | elapsed time per iteration (s): 0.68 | learning rate: 5.377E-05 | global batch size: 256 | lm loss: 2.443958E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.652 | TFLOPs: 22.73 | 31: iteration 342300/ 476837 | consumed samples: 87628800 | consumed tokens: 179463782400 | elapsed time per iteration (s): 0.68 | learning rate: 5.373E-05 | global batch size: 256 | lm loss: 2.442171E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.587 | TFLOPs: 22.78 | 31: iteration 342400/ 476837 | consumed samples: 87654400 | consumed tokens: 179516211200 | elapsed time per iteration (s): 0.68 | learning rate: 5.368E-05 | global batch size: 256 | lm loss: 2.444767E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.191 | TFLOPs: 22.76 | 31: iteration 342500/ 476837 | consumed samples: 87680000 | consumed tokens: 179568640000 | elapsed time per iteration (s): 0.68 | learning rate: 5.363E-05 | global batch size: 256 | lm loss: 2.443633E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.646 | TFLOPs: 22.79 | 31: iteration 342600/ 476837 | consumed samples: 87705600 | consumed tokens: 179621068800 | elapsed time per iteration (s): 0.68 | learning rate: 5.359E-05 | global batch size: 256 | lm loss: 2.439534E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.762 | TFLOPs: 22.79 | 31: iteration 342700/ 476837 | consumed samples: 87731200 | consumed tokens: 179673497600 | elapsed time per iteration (s): 0.68 | learning rate: 5.354E-05 | global batch size: 256 | lm loss: 2.450359E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.941 | TFLOPs: 22.74 | 31: iteration 342800/ 476837 | consumed samples: 87756800 | consumed tokens: 179725926400 | elapsed time per iteration (s): 0.68 | learning rate: 5.349E-05 | global batch size: 256 | lm loss: 2.442480E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.039 | TFLOPs: 22.75 | 31: iteration 342900/ 476837 | consumed samples: 87782400 | consumed tokens: 179778355200 | elapsed time per iteration (s): 0.79 | learning rate: 5.345E-05 | global batch size: 256 | lm loss: 2.442364E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 324.530 | TFLOPs: 19.63 | 31: iteration 343000/ 476837 | consumed samples: 87808000 | consumed tokens: 179830784000 | elapsed time per iteration (s): 0.85 | learning rate: 5.340E-05 | global batch size: 256 | lm loss: 2.445361E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 302.900 | TFLOPs: 18.32 | 31: iteration 343100/ 476837 | consumed samples: 87833600 | consumed tokens: 179883212800 | elapsed time per iteration (s): 0.68 | learning rate: 5.335E-05 | global batch size: 256 | lm loss: 2.443716E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.781 | TFLOPs: 22.79 | 31: iteration 343200/ 476837 | consumed samples: 87859200 | consumed tokens: 179935641600 | elapsed time per iteration (s): 0.68 | learning rate: 5.331E-05 | global batch size: 256 | lm loss: 2.440238E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.806 | TFLOPs: 22.80 | 31: iteration 343300/ 476837 | consumed samples: 87884800 | consumed tokens: 179988070400 | elapsed time per iteration (s): 0.68 | learning rate: 5.326E-05 | global batch size: 256 | lm loss: 2.444644E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.852 | TFLOPs: 22.80 | 31: iteration 343400/ 476837 | consumed samples: 87910400 | consumed tokens: 180040499200 | elapsed time per iteration (s): 0.68 | learning rate: 5.321E-05 | global batch size: 256 | lm loss: 2.440724E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.012 | TFLOPs: 22.69 | 31: iteration 343500/ 476837 | consumed samples: 87936000 | consumed tokens: 180092928000 | elapsed time per iteration (s): 0.68 | learning rate: 5.317E-05 | global batch size: 256 | lm loss: 2.446258E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.665 | TFLOPs: 22.79 | 31: iteration 343600/ 476837 | consumed samples: 87961600 | consumed tokens: 180145356800 | elapsed time per iteration (s): 0.68 | learning rate: 5.312E-05 | global batch size: 256 | lm loss: 2.442052E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.792 | TFLOPs: 22.67 | 31: iteration 343700/ 476837 | consumed samples: 87987200 | consumed tokens: 180197785600 | elapsed time per iteration (s): 0.68 | learning rate: 5.308E-05 | global batch size: 256 | lm loss: 2.443189E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.783 | TFLOPs: 22.73 | 31: iteration 343800/ 476837 | consumed samples: 88012800 | consumed tokens: 180250214400 | elapsed time per iteration (s): 0.68 | learning rate: 5.303E-05 | global batch size: 256 | lm loss: 2.438762E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.406 | TFLOPs: 22.71 | 31: iteration 343900/ 476837 | consumed samples: 88038400 | consumed tokens: 180302643200 | elapsed time per iteration (s): 0.68 | learning rate: 5.298E-05 | global batch size: 256 | lm loss: 2.439284E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.880 | TFLOPs: 22.74 | 0: [2023-04-28 15:06:07,136] [INFO] [logging.py:68:log_dist] [Rank 0] step=344000, skipped=0, lr=[5.293600714833975e-05, 5.293600714833975e-05, 5.293600714833975e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 344000/ 476837 | consumed samples: 88064000 | consumed tokens: 180355072000 | elapsed time per iteration (s): 0.73 | learning rate: 5.294E-05 | global batch size: 256 | lm loss: 2.443698E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 351.768 | TFLOPs: 21.28 | 0: steps: 344000 loss: 2.4081 iter time (s): 0.694 samples/sec: 368.720 31: iteration 344100/ 476837 | consumed samples: 88089600 | consumed tokens: 180407500800 | elapsed time per iteration (s): 0.68 | learning rate: 5.289E-05 | global batch size: 256 | lm loss: 2.443966E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.669 | TFLOPs: 22.73 | 31: iteration 344200/ 476837 | consumed samples: 88115200 | consumed tokens: 180459929600 | elapsed time per iteration (s): 0.68 | learning rate: 5.284E-05 | global batch size: 256 | lm loss: 2.447068E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.336 | TFLOPs: 22.65 | 31: iteration 344300/ 476837 | consumed samples: 88140800 | consumed tokens: 180512358400 | elapsed time per iteration (s): 0.68 | learning rate: 5.280E-05 | global batch size: 256 | lm loss: 2.443280E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.447 | TFLOPs: 22.77 | 31: iteration 344400/ 476837 | consumed samples: 88166400 | consumed tokens: 180564787200 | elapsed time per iteration (s): 0.68 | learning rate: 5.275E-05 | global batch size: 256 | lm loss: 2.441924E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.509 | TFLOPs: 22.72 | 31: iteration 344500/ 476837 | consumed samples: 88192000 | consumed tokens: 180617216000 | elapsed time per iteration (s): 0.68 | learning rate: 5.270E-05 | global batch size: 256 | lm loss: 2.442362E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.207 | TFLOPs: 22.76 | 31: iteration 344600/ 476837 | consumed samples: 88217600 | consumed tokens: 180669644800 | elapsed time per iteration (s): 0.68 | learning rate: 5.266E-05 | global batch size: 256 | lm loss: 2.439665E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.061 | TFLOPs: 22.81 | 31: iteration 344700/ 476837 | consumed samples: 88243200 | consumed tokens: 180722073600 | elapsed time per iteration (s): 0.68 | learning rate: 5.261E-05 | global batch size: 256 | lm loss: 2.439921E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.654 | TFLOPs: 22.79 | 31: iteration 344800/ 476837 | consumed samples: 88268800 | consumed tokens: 180774502400 | elapsed time per iteration (s): 0.68 | learning rate: 5.257E-05 | global batch size: 256 | lm loss: 2.440541E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.681 | TFLOPs: 22.79 | 31: iteration 344900/ 476837 | consumed samples: 88294400 | consumed tokens: 180826931200 | elapsed time per iteration (s): 0.68 | learning rate: 5.252E-05 | global batch size: 256 | lm loss: 2.440273E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.019 | TFLOPs: 22.81 | 31: iteration 345000/ 476837 | consumed samples: 88320000 | consumed tokens: 180879360000 | elapsed time per iteration (s): 0.68 | learning rate: 5.247E-05 | global batch size: 256 | lm loss: 2.438999E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.617 | TFLOPs: 22.78 | 31: iteration 345100/ 476837 | consumed samples: 88345600 | consumed tokens: 180931788800 | elapsed time per iteration (s): 0.68 | learning rate: 5.243E-05 | global batch size: 256 | lm loss: 2.440864E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.966 | TFLOPs: 22.81 | 31: iteration 345200/ 476837 | consumed samples: 88371200 | consumed tokens: 180984217600 | elapsed time per iteration (s): 0.68 | learning rate: 5.238E-05 | global batch size: 256 | lm loss: 2.440507E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.450 | TFLOPs: 22.71 | 31: iteration 345300/ 476837 | consumed samples: 88396800 | consumed tokens: 181036646400 | elapsed time per iteration (s): 0.68 | learning rate: 5.234E-05 | global batch size: 256 | lm loss: 2.441548E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.978 | TFLOPs: 22.75 | 31: iteration 345400/ 476837 | consumed samples: 88422400 | consumed tokens: 181089075200 | elapsed time per iteration (s): 0.68 | learning rate: 5.229E-05 | global batch size: 256 | lm loss: 2.441180E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.889 | TFLOPs: 22.80 | 31: iteration 345500/ 476837 | consumed samples: 88448000 | consumed tokens: 181141504000 | elapsed time per iteration (s): 0.68 | learning rate: 5.224E-05 | global batch size: 256 | lm loss: 2.440305E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.426 | TFLOPs: 22.77 | 31: iteration 345600/ 476837 | consumed samples: 88473600 | consumed tokens: 181193932800 | elapsed time per iteration (s): 0.68 | learning rate: 5.220E-05 | global batch size: 256 | lm loss: 2.440613E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.232 | TFLOPs: 22.76 | 31: iteration 345700/ 476837 | consumed samples: 88499200 | consumed tokens: 181246361600 | elapsed time per iteration (s): 0.68 | learning rate: 5.215E-05 | global batch size: 256 | lm loss: 2.438924E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.777 | TFLOPs: 22.79 | 31: iteration 345800/ 476837 | consumed samples: 88524800 | consumed tokens: 181298790400 | elapsed time per iteration (s): 0.68 | learning rate: 5.211E-05 | global batch size: 256 | lm loss: 2.442705E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.281 | TFLOPs: 22.70 | 31: iteration 345900/ 476837 | consumed samples: 88550400 | consumed tokens: 181351219200 | elapsed time per iteration (s): 0.68 | learning rate: 5.206E-05 | global batch size: 256 | lm loss: 2.438939E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.596 | TFLOPs: 22.78 | 0: [2023-04-28 15:28:47,875] [INFO] [logging.py:68:log_dist] [Rank 0] step=346000, skipped=0, lr=[5.2014764288486835e-05, 5.2014764288486835e-05, 5.2014764288486835e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 346000/ 476837 | consumed samples: 88576000 | consumed tokens: 181403648000 | elapsed time per iteration (s): 0.68 | learning rate: 5.201E-05 | global batch size: 256 | lm loss: 2.434185E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.546 | TFLOPs: 22.78 | 0: steps: 346000 loss: 2.4474 iter time (s): 0.678 samples/sec: 377.647 31: iteration 346100/ 476837 | consumed samples: 88601600 | consumed tokens: 181456076800 | elapsed time per iteration (s): 0.68 | learning rate: 5.197E-05 | global batch size: 256 | lm loss: 2.437235E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.173 | TFLOPs: 22.76 | 31: iteration 346200/ 476837 | consumed samples: 88627200 | consumed tokens: 181508505600 | elapsed time per iteration (s): 0.68 | learning rate: 5.192E-05 | global batch size: 256 | lm loss: 2.443190E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.850 | TFLOPs: 22.80 | 31: iteration 346300/ 476837 | consumed samples: 88652800 | consumed tokens: 181560934400 | elapsed time per iteration (s): 0.80 | learning rate: 5.188E-05 | global batch size: 256 | lm loss: 2.436555E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 318.815 | TFLOPs: 19.29 | 31: iteration 346400/ 476837 | consumed samples: 88678400 | consumed tokens: 181613363200 | elapsed time per iteration (s): 0.83 | learning rate: 5.183E-05 | global batch size: 256 | lm loss: 2.442606E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 307.353 | TFLOPs: 18.59 | 31: iteration 346500/ 476837 | consumed samples: 88704000 | consumed tokens: 181665792000 | elapsed time per iteration (s): 0.68 | learning rate: 5.179E-05 | global batch size: 256 | lm loss: 2.440945E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.983 | TFLOPs: 22.81 | 31: iteration 346600/ 476837 | consumed samples: 88729600 | consumed tokens: 181718220800 | elapsed time per iteration (s): 0.68 | learning rate: 5.174E-05 | global batch size: 256 | lm loss: 2.437974E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.689 | TFLOPs: 22.73 | 31: iteration 346700/ 476837 | consumed samples: 88755200 | consumed tokens: 181770649600 | elapsed time per iteration (s): 0.68 | learning rate: 5.169E-05 | global batch size: 256 | lm loss: 2.438604E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.851 | TFLOPs: 22.68 | 31: iteration 346800/ 476837 | consumed samples: 88780800 | consumed tokens: 181823078400 | elapsed time per iteration (s): 0.68 | learning rate: 5.165E-05 | global batch size: 256 | lm loss: 2.438030E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.567 | TFLOPs: 22.78 | 31: iteration 346900/ 476837 | consumed samples: 88806400 | consumed tokens: 181875507200 | elapsed time per iteration (s): 0.68 | learning rate: 5.160E-05 | global batch size: 256 | lm loss: 2.443954E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.701 | TFLOPs: 22.79 | 31: iteration 347000/ 476837 | consumed samples: 88832000 | consumed tokens: 181927936000 | elapsed time per iteration (s): 0.68 | learning rate: 5.156E-05 | global batch size: 256 | lm loss: 2.435620E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.757 | TFLOPs: 22.73 | 31: iteration 347100/ 476837 | consumed samples: 88857600 | consumed tokens: 181980364800 | elapsed time per iteration (s): 0.68 | learning rate: 5.151E-05 | global batch size: 256 | lm loss: 2.439776E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.818 | TFLOPs: 22.62 | 31: iteration 347200/ 476837 | consumed samples: 88883200 | consumed tokens: 182032793600 | elapsed time per iteration (s): 0.68 | learning rate: 5.147E-05 | global batch size: 256 | lm loss: 2.440007E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.002 | TFLOPs: 22.81 | 31: iteration 347300/ 476837 | consumed samples: 88908800 | consumed tokens: 182085222400 | elapsed time per iteration (s): 0.68 | learning rate: 5.142E-05 | global batch size: 256 | lm loss: 2.443513E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.898 | TFLOPs: 22.80 | 31: iteration 347400/ 476837 | consumed samples: 88934400 | consumed tokens: 182137651200 | elapsed time per iteration (s): 0.68 | learning rate: 5.138E-05 | global batch size: 256 | lm loss: 2.437789E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.161 | TFLOPs: 22.76 | 31: iteration 347500/ 476837 | consumed samples: 88960000 | consumed tokens: 182190080000 | elapsed time per iteration (s): 0.68 | learning rate: 5.133E-05 | global batch size: 256 | lm loss: 2.438344E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.245 | TFLOPs: 22.70 | 31: iteration 347600/ 476837 | consumed samples: 88985600 | consumed tokens: 182242508800 | elapsed time per iteration (s): 0.68 | learning rate: 5.129E-05 | global batch size: 256 | lm loss: 2.438838E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.442 | TFLOPs: 22.77 | 31: iteration 347700/ 476837 | consumed samples: 89011200 | consumed tokens: 182294937600 | elapsed time per iteration (s): 0.68 | learning rate: 5.124E-05 | global batch size: 256 | lm loss: 2.438398E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.859 | TFLOPs: 22.80 | 31: iteration 347800/ 476837 | consumed samples: 89036800 | consumed tokens: 182347366400 | elapsed time per iteration (s): 0.68 | learning rate: 5.119E-05 | global batch size: 256 | lm loss: 2.436643E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.955 | TFLOPs: 22.74 | 31: iteration 347900/ 476837 | consumed samples: 89062400 | consumed tokens: 182399795200 | elapsed time per iteration (s): 0.68 | learning rate: 5.115E-05 | global batch size: 256 | lm loss: 2.437724E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.302 | TFLOPs: 22.77 | 0: [2023-04-28 15:51:56,523] [INFO] [logging.py:68:log_dist] [Rank 0] step=348000, skipped=0, lr=[5.1103793550814864e-05, 5.1103793550814864e-05, 5.1103793550814864e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 348000/ 476837 | consumed samples: 89088000 | consumed tokens: 182452224000 | elapsed time per iteration (s): 0.68 | learning rate: 5.110E-05 | global batch size: 256 | lm loss: 2.438650E+00 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.394 | TFLOPs: 22.77 | 0: steps: 348000 loss: 2.4291 iter time (s): 0.691 samples/sec: 370.573 31: iteration 348100/ 476837 | consumed samples: 89113600 | consumed tokens: 182504652800 | elapsed time per iteration (s): 0.68 | learning rate: 5.106E-05 | global batch size: 256 | lm loss: 2.440149E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.756 | TFLOPs: 22.79 | 31: iteration 348200/ 476837 | consumed samples: 89139200 | consumed tokens: 182557081600 | elapsed time per iteration (s): 0.68 | learning rate: 5.101E-05 | global batch size: 256 | lm loss: 2.436544E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.898 | TFLOPs: 22.80 | 31: iteration 348300/ 476837 | consumed samples: 89164800 | consumed tokens: 182609510400 | elapsed time per iteration (s): 0.68 | learning rate: 5.097E-05 | global batch size: 256 | lm loss: 2.439778E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.018 | TFLOPs: 22.75 | 31: iteration 348400/ 476837 | consumed samples: 89190400 | consumed tokens: 182661939200 | elapsed time per iteration (s): 0.68 | learning rate: 5.092E-05 | global batch size: 256 | lm loss: 2.435558E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.304 | TFLOPs: 22.77 | 31: iteration 348500/ 476837 | consumed samples: 89216000 | consumed tokens: 182714368000 | elapsed time per iteration (s): 0.68 | learning rate: 5.088E-05 | global batch size: 256 | lm loss: 2.438636E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.833 | TFLOPs: 22.80 | 31: iteration 348600/ 476837 | consumed samples: 89241600 | consumed tokens: 182766796800 | elapsed time per iteration (s): 0.68 | learning rate: 5.083E-05 | global batch size: 256 | lm loss: 2.434644E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.959 | TFLOPs: 22.62 | 31: iteration 348700/ 476837 | consumed samples: 89267200 | consumed tokens: 182819225600 | elapsed time per iteration (s): 0.70 | learning rate: 5.079E-05 | global batch size: 256 | lm loss: 2.435596E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.452 | TFLOPs: 22.23 | 31: iteration 348800/ 476837 | consumed samples: 89292800 | consumed tokens: 182871654400 | elapsed time per iteration (s): 0.68 | learning rate: 5.074E-05 | global batch size: 256 | lm loss: 2.439361E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.967 | TFLOPs: 22.68 | 31: iteration 348900/ 476837 | consumed samples: 89318400 | consumed tokens: 182924083200 | elapsed time per iteration (s): 0.68 | learning rate: 5.070E-05 | global batch size: 256 | lm loss: 2.436208E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.106 | TFLOPs: 22.81 | 31: iteration 349000/ 476837 | consumed samples: 89344000 | consumed tokens: 182976512000 | elapsed time per iteration (s): 0.69 | learning rate: 5.065E-05 | global batch size: 256 | lm loss: 2.437447E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.604 | TFLOPs: 22.54 | 31: iteration 349100/ 476837 | consumed samples: 89369600 | consumed tokens: 183028940800 | elapsed time per iteration (s): 0.70 | learning rate: 5.061E-05 | global batch size: 256 | lm loss: 2.436155E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.675 | TFLOPs: 22.18 | 31: iteration 349200/ 476837 | consumed samples: 89395200 | consumed tokens: 183081369600 | elapsed time per iteration (s): 0.70 | learning rate: 5.056E-05 | global batch size: 256 | lm loss: 2.438223E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.390 | TFLOPs: 22.17 | 31: iteration 349300/ 476837 | consumed samples: 89420800 | consumed tokens: 183133798400 | elapsed time per iteration (s): 0.68 | learning rate: 5.052E-05 | global batch size: 256 | lm loss: 2.440616E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.099 | TFLOPs: 22.69 | 31: iteration 349400/ 476837 | consumed samples: 89446400 | consumed tokens: 183186227200 | elapsed time per iteration (s): 0.68 | learning rate: 5.047E-05 | global batch size: 256 | lm loss: 2.439040E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.741 | TFLOPs: 22.73 | 31: iteration 349500/ 476837 | consumed samples: 89472000 | consumed tokens: 183238656000 | elapsed time per iteration (s): 0.68 | learning rate: 5.043E-05 | global batch size: 256 | lm loss: 2.438591E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.084 | TFLOPs: 22.69 | 31: iteration 349600/ 476837 | consumed samples: 89497600 | consumed tokens: 183291084800 | elapsed time per iteration (s): 0.68 | learning rate: 5.038E-05 | global batch size: 256 | lm loss: 2.437233E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.521 | TFLOPs: 22.72 | 31: iteration 349700/ 476837 | consumed samples: 89523200 | consumed tokens: 183343513600 | elapsed time per iteration (s): 0.70 | learning rate: 5.034E-05 | global batch size: 256 | lm loss: 2.436428E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 366.105 | TFLOPs: 22.15 | 31: iteration 349800/ 476837 | consumed samples: 89548800 | consumed tokens: 183395942400 | elapsed time per iteration (s): 0.94 | learning rate: 5.029E-05 | global batch size: 256 | lm loss: 2.439695E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 273.736 | TFLOPs: 16.56 | 31: iteration 349900/ 476837 | consumed samples: 89574400 | consumed tokens: 183448371200 | elapsed time per iteration (s): 0.69 | learning rate: 5.025E-05 | global batch size: 256 | lm loss: 2.439142E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.886 | TFLOPs: 22.50 | 0: [2023-04-28 16:15:12,154] [INFO] [logging.py:68:log_dist] [Rank 0] step=350000, skipped=0, lr=[5.0203256314381896e-05, 5.0203256314381896e-05, 5.0203256314381896e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 350000/ 476837 | consumed samples: 89600000 | consumed tokens: 183500800000 | elapsed time per iteration (s): 0.68 | learning rate: 5.020E-05 | global batch size: 256 | lm loss: 2.435689E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.761 | TFLOPs: 22.79 | 0: steps: 350000 loss: 2.4352 iter time (s): 0.694 samples/sec: 369.140 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 350000 | lm loss value: 2.950765E+00 | lm loss PPL: 1.912058E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 350100/ 476837 | consumed samples: 89625600 | consumed tokens: 183553228800 | elapsed time per iteration (s): 0.74 | learning rate: 5.016E-05 | global batch size: 256 | lm loss: 2.436453E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 348.146 | TFLOPs: 21.06 | 31: iteration 350200/ 476837 | consumed samples: 89651200 | consumed tokens: 183605657600 | elapsed time per iteration (s): 0.68 | learning rate: 5.011E-05 | global batch size: 256 | lm loss: 2.436998E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.695 | TFLOPs: 22.79 | 31: iteration 350300/ 476837 | consumed samples: 89676800 | consumed tokens: 183658086400 | elapsed time per iteration (s): 0.68 | learning rate: 5.007E-05 | global batch size: 256 | lm loss: 2.436665E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.963 | TFLOPs: 22.81 | 31: iteration 350400/ 476837 | consumed samples: 89702400 | consumed tokens: 183710515200 | elapsed time per iteration (s): 0.68 | learning rate: 5.002E-05 | global batch size: 256 | lm loss: 2.437316E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.302 | TFLOPs: 22.77 | 31: iteration 350500/ 476837 | consumed samples: 89728000 | consumed tokens: 183762944000 | elapsed time per iteration (s): 0.68 | learning rate: 4.998E-05 | global batch size: 256 | lm loss: 2.435927E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.732 | TFLOPs: 22.79 | 31: iteration 350600/ 476837 | consumed samples: 89753600 | consumed tokens: 183815372800 | elapsed time per iteration (s): 0.68 | learning rate: 4.994E-05 | global batch size: 256 | lm loss: 2.436154E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.747 | TFLOPs: 22.79 | 31: iteration 350700/ 476837 | consumed samples: 89779200 | consumed tokens: 183867801600 | elapsed time per iteration (s): 0.69 | learning rate: 4.989E-05 | global batch size: 256 | lm loss: 2.438142E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.438 | TFLOPs: 22.47 | 31: iteration 350800/ 476837 | consumed samples: 89804800 | consumed tokens: 183920230400 | elapsed time per iteration (s): 0.68 | learning rate: 4.985E-05 | global batch size: 256 | lm loss: 2.435072E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.081 | TFLOPs: 22.75 | 31: iteration 350900/ 476837 | consumed samples: 89830400 | consumed tokens: 183972659200 | elapsed time per iteration (s): 0.68 | learning rate: 4.980E-05 | global batch size: 256 | lm loss: 2.436733E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.319 | TFLOPs: 22.77 | 31: iteration 351000/ 476837 | consumed samples: 89856000 | consumed tokens: 184025088000 | elapsed time per iteration (s): 0.68 | learning rate: 4.976E-05 | global batch size: 256 | lm loss: 2.434617E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.701 | TFLOPs: 22.79 | 31: iteration 351100/ 476837 | consumed samples: 89881600 | consumed tokens: 184077516800 | elapsed time per iteration (s): 0.68 | learning rate: 4.971E-05 | global batch size: 256 | lm loss: 2.434688E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.027 | TFLOPs: 22.69 | 31: iteration 351200/ 476837 | consumed samples: 89907200 | consumed tokens: 184129945600 | elapsed time per iteration (s): 0.68 | learning rate: 4.967E-05 | global batch size: 256 | lm loss: 2.437532E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.976 | TFLOPs: 22.75 | 31: iteration 351300/ 476837 | consumed samples: 89932800 | consumed tokens: 184182374400 | elapsed time per iteration (s): 0.68 | learning rate: 4.962E-05 | global batch size: 256 | lm loss: 2.433622E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.057 | TFLOPs: 22.81 | 31: iteration 351400/ 476837 | consumed samples: 89958400 | consumed tokens: 184234803200 | elapsed time per iteration (s): 0.69 | learning rate: 4.958E-05 | global batch size: 256 | lm loss: 2.438515E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.591 | TFLOPs: 22.48 | 31: iteration 351500/ 476837 | consumed samples: 89984000 | consumed tokens: 184287232000 | elapsed time per iteration (s): 0.68 | learning rate: 4.953E-05 | global batch size: 256 | lm loss: 2.433385E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.609 | TFLOPs: 22.78 | 31: iteration 351600/ 476837 | consumed samples: 90009600 | consumed tokens: 184339660800 | elapsed time per iteration (s): 0.68 | learning rate: 4.949E-05 | global batch size: 256 | lm loss: 2.435554E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.513 | TFLOPs: 22.78 | 31: iteration 351700/ 476837 | consumed samples: 90035200 | consumed tokens: 184392089600 | elapsed time per iteration (s): 0.68 | learning rate: 4.945E-05 | global batch size: 256 | lm loss: 2.436939E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.861 | TFLOPs: 22.80 | 31: iteration 351800/ 476837 | consumed samples: 90060800 | consumed tokens: 184444518400 | elapsed time per iteration (s): 0.68 | learning rate: 4.940E-05 | global batch size: 256 | lm loss: 2.437536E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.091 | TFLOPs: 22.81 | 31: iteration 351900/ 476837 | consumed samples: 90086400 | consumed tokens: 184496947200 | elapsed time per iteration (s): 0.68 | learning rate: 4.936E-05 | global batch size: 256 | lm loss: 2.431897E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.100 | TFLOPs: 22.81 | 0: [2023-04-28 16:37:59,416] [INFO] [logging.py:68:log_dist] [Rank 0] step=352000, skipped=0, lr=[4.9313312109944464e-05, 4.9313312109944464e-05, 4.9313312109944464e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 352000/ 476837 | consumed samples: 90112000 | consumed tokens: 184549376000 | elapsed time per iteration (s): 0.68 | learning rate: 4.931E-05 | global batch size: 256 | lm loss: 2.436549E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.721 | TFLOPs: 22.73 | 0: steps: 352000 loss: 2.4345 iter time (s): 0.680 samples/sec: 376.377 31: iteration 352100/ 476837 | consumed samples: 90137600 | consumed tokens: 184601804800 | elapsed time per iteration (s): 0.68 | learning rate: 4.927E-05 | global batch size: 256 | lm loss: 2.437265E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.989 | TFLOPs: 22.81 | 31: iteration 352200/ 476837 | consumed samples: 90163200 | consumed tokens: 184654233600 | elapsed time per iteration (s): 0.68 | learning rate: 4.922E-05 | global batch size: 256 | lm loss: 2.427464E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.003 | TFLOPs: 22.81 | 31: iteration 352300/ 476837 | consumed samples: 90188800 | consumed tokens: 184706662400 | elapsed time per iteration (s): 0.68 | learning rate: 4.918E-05 | global batch size: 256 | lm loss: 2.439470E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.701 | TFLOPs: 22.73 | 31: iteration 352400/ 476837 | consumed samples: 90214400 | consumed tokens: 184759091200 | elapsed time per iteration (s): 0.68 | learning rate: 4.914E-05 | global batch size: 256 | lm loss: 2.436430E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.530 | TFLOPs: 22.72 | 31: iteration 352500/ 476837 | consumed samples: 90240000 | consumed tokens: 184811520000 | elapsed time per iteration (s): 0.68 | learning rate: 4.909E-05 | global batch size: 256 | lm loss: 2.432105E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.493 | TFLOPs: 22.78 | 31: iteration 352600/ 476837 | consumed samples: 90265600 | consumed tokens: 184863948800 | elapsed time per iteration (s): 0.68 | learning rate: 4.905E-05 | global batch size: 256 | lm loss: 2.434211E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.940 | TFLOPs: 22.68 | 31: iteration 352700/ 476837 | consumed samples: 90291200 | consumed tokens: 184916377600 | elapsed time per iteration (s): 0.68 | learning rate: 4.900E-05 | global batch size: 256 | lm loss: 2.429881E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.503 | TFLOPs: 22.78 | 31: iteration 352800/ 476837 | consumed samples: 90316800 | consumed tokens: 184968806400 | elapsed time per iteration (s): 0.68 | learning rate: 4.896E-05 | global batch size: 256 | lm loss: 2.432279E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.248 | TFLOPs: 22.76 | 31: iteration 352900/ 476837 | consumed samples: 90342400 | consumed tokens: 185021235200 | elapsed time per iteration (s): 0.68 | learning rate: 4.892E-05 | global batch size: 256 | lm loss: 2.433224E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.890 | TFLOPs: 22.80 | 31: iteration 353000/ 476837 | consumed samples: 90368000 | consumed tokens: 185073664000 | elapsed time per iteration (s): 0.68 | learning rate: 4.887E-05 | global batch size: 256 | lm loss: 2.433222E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.967 | TFLOPs: 22.81 | 31: iteration 353100/ 476837 | consumed samples: 90393600 | consumed tokens: 185126092800 | elapsed time per iteration (s): 0.68 | learning rate: 4.883E-05 | global batch size: 256 | lm loss: 2.435396E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.184 | TFLOPs: 22.76 | 31: iteration 353200/ 476837 | consumed samples: 90419200 | consumed tokens: 185178521600 | elapsed time per iteration (s): 0.79 | learning rate: 4.878E-05 | global batch size: 256 | lm loss: 2.430807E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 325.140 | TFLOPs: 19.67 | 31: iteration 353300/ 476837 | consumed samples: 90444800 | consumed tokens: 185230950400 | elapsed time per iteration (s): 0.86 | learning rate: 4.874E-05 | global batch size: 256 | lm loss: 2.435355E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 297.298 | TFLOPs: 17.99 | 31: iteration 353400/ 476837 | consumed samples: 90470400 | consumed tokens: 185283379200 | elapsed time per iteration (s): 0.68 | learning rate: 4.870E-05 | global batch size: 256 | lm loss: 2.431706E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 377.046 | TFLOPs: 22.81 | 31: iteration 353500/ 476837 | consumed samples: 90496000 | consumed tokens: 185335808000 | elapsed time per iteration (s): 0.68 | learning rate: 4.865E-05 | global batch size: 256 | lm loss: 2.435225E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.736 | TFLOPs: 22.73 | 31: iteration 353600/ 476837 | consumed samples: 90521600 | consumed tokens: 185388236800 | elapsed time per iteration (s): 0.68 | learning rate: 4.861E-05 | global batch size: 256 | lm loss: 2.432186E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.383 | TFLOPs: 22.77 | 31: iteration 353700/ 476837 | consumed samples: 90547200 | consumed tokens: 185440665600 | elapsed time per iteration (s): 0.68 | learning rate: 4.857E-05 | global batch size: 256 | lm loss: 2.429468E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.162 | TFLOPs: 22.76 | 31: iteration 353800/ 476837 | consumed samples: 90572800 | consumed tokens: 185493094400 | elapsed time per iteration (s): 0.68 | learning rate: 4.852E-05 | global batch size: 256 | lm loss: 2.434322E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.718 | TFLOPs: 22.79 | 31: iteration 353900/ 476837 | consumed samples: 90598400 | consumed tokens: 185545523200 | elapsed time per iteration (s): 0.68 | learning rate: 4.848E-05 | global batch size: 256 | lm loss: 2.434661E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.898 | TFLOPs: 22.80 | 0: [2023-04-28 17:01:08,643] [INFO] [logging.py:68:log_dist] [Rank 0] step=354000, skipped=0, lr=[4.8434118591696614e-05, 4.8434118591696614e-05, 4.8434118591696614e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 354000/ 476837 | consumed samples: 90624000 | consumed tokens: 185597952000 | elapsed time per iteration (s): 0.68 | learning rate: 4.843E-05 | global batch size: 256 | lm loss: 2.434641E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.969 | TFLOPs: 22.75 | 0: steps: 354000 loss: 2.4088 iter time (s): 0.691 samples/sec: 370.452 31: iteration 354100/ 476837 | consumed samples: 90649600 | consumed tokens: 185650380800 | elapsed time per iteration (s): 0.68 | learning rate: 4.839E-05 | global batch size: 256 | lm loss: 2.438813E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.185 | TFLOPs: 22.76 | 31: iteration 354200/ 476837 | consumed samples: 90675200 | consumed tokens: 185702809600 | elapsed time per iteration (s): 0.68 | learning rate: 4.835E-05 | global batch size: 256 | lm loss: 2.432526E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.665 | TFLOPs: 22.67 | 31: iteration 354300/ 476837 | consumed samples: 90700800 | consumed tokens: 185755238400 | elapsed time per iteration (s): 0.68 | learning rate: 4.830E-05 | global batch size: 256 | lm loss: 2.435060E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.846 | TFLOPs: 22.80 | 31: iteration 354400/ 476837 | consumed samples: 90726400 | consumed tokens: 185807667200 | elapsed time per iteration (s): 0.68 | learning rate: 4.826E-05 | global batch size: 256 | lm loss: 2.436289E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.827 | TFLOPs: 22.80 | 31: iteration 354500/ 476837 | consumed samples: 90752000 | consumed tokens: 185860096000 | elapsed time per iteration (s): 0.68 | learning rate: 4.822E-05 | global batch size: 256 | lm loss: 2.432082E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.818 | TFLOPs: 22.80 | 31: iteration 354600/ 476837 | consumed samples: 90777600 | consumed tokens: 185912524800 | elapsed time per iteration (s): 0.68 | learning rate: 4.817E-05 | global batch size: 256 | lm loss: 2.431078E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.815 | TFLOPs: 22.80 | 31: iteration 354700/ 476837 | consumed samples: 90803200 | consumed tokens: 185964953600 | elapsed time per iteration (s): 0.68 | learning rate: 4.813E-05 | global batch size: 256 | lm loss: 2.432535E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.163 | TFLOPs: 22.70 | 31: iteration 354800/ 476837 | consumed samples: 90828800 | consumed tokens: 186017382400 | elapsed time per iteration (s): 0.69 | learning rate: 4.809E-05 | global batch size: 256 | lm loss: 2.432922E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.048 | TFLOPs: 22.51 | 31: iteration 354900/ 476837 | consumed samples: 90854400 | consumed tokens: 186069811200 | elapsed time per iteration (s): 0.69 | learning rate: 4.804E-05 | global batch size: 256 | lm loss: 2.433087E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.378 | TFLOPs: 22.47 | 31: iteration 355000/ 476837 | consumed samples: 90880000 | consumed tokens: 186122240000 | elapsed time per iteration (s): 0.68 | learning rate: 4.800E-05 | global batch size: 256 | lm loss: 2.429716E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.548 | TFLOPs: 22.66 | 31: iteration 355100/ 476837 | consumed samples: 90905600 | consumed tokens: 186174668800 | elapsed time per iteration (s): 0.68 | learning rate: 4.796E-05 | global batch size: 256 | lm loss: 2.431436E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.186 | TFLOPs: 22.70 | 31: iteration 355200/ 476837 | consumed samples: 90931200 | consumed tokens: 186227097600 | elapsed time per iteration (s): 0.68 | learning rate: 4.791E-05 | global batch size: 256 | lm loss: 2.432283E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.691 | TFLOPs: 22.73 | 31: iteration 355300/ 476837 | consumed samples: 90956800 | consumed tokens: 186279526400 | elapsed time per iteration (s): 0.68 | learning rate: 4.787E-05 | global batch size: 256 | lm loss: 2.428175E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.164 | TFLOPs: 22.70 | 31: iteration 355400/ 476837 | consumed samples: 90982400 | consumed tokens: 186331955200 | elapsed time per iteration (s): 0.68 | learning rate: 4.783E-05 | global batch size: 256 | lm loss: 2.430076E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.215 | TFLOPs: 22.64 | 31: iteration 355500/ 476837 | consumed samples: 91008000 | consumed tokens: 186384384000 | elapsed time per iteration (s): 0.68 | learning rate: 4.778E-05 | global batch size: 256 | lm loss: 2.433488E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.200 | TFLOPs: 22.76 | 31: iteration 355600/ 476837 | consumed samples: 91033600 | consumed tokens: 186436812800 | elapsed time per iteration (s): 0.68 | learning rate: 4.774E-05 | global batch size: 256 | lm loss: 2.432324E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.116 | TFLOPs: 22.75 | 31: iteration 355700/ 476837 | consumed samples: 91059200 | consumed tokens: 186489241600 | elapsed time per iteration (s): 0.68 | learning rate: 4.770E-05 | global batch size: 256 | lm loss: 2.428151E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.537 | TFLOPs: 22.78 | 31: iteration 355800/ 476837 | consumed samples: 91084800 | consumed tokens: 186541670400 | elapsed time per iteration (s): 0.68 | learning rate: 4.765E-05 | global batch size: 256 | lm loss: 2.430999E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.917 | TFLOPs: 22.74 | 31: iteration 355900/ 476837 | consumed samples: 91110400 | consumed tokens: 186594099200 | elapsed time per iteration (s): 0.68 | learning rate: 4.761E-05 | global batch size: 256 | lm loss: 2.430245E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.662 | TFLOPs: 22.79 | 0: [2023-04-28 17:23:52,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=356000, skipped=0, lr=[4.756583150934136e-05, 4.756583150934136e-05, 4.756583150934136e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 356000/ 476837 | consumed samples: 91136000 | consumed tokens: 186646528000 | elapsed time per iteration (s): 0.68 | learning rate: 4.757E-05 | global batch size: 256 | lm loss: 2.435338E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.749 | TFLOPs: 22.79 | 0: steps: 356000 loss: 2.4121 iter time (s): 0.678 samples/sec: 377.310 31: iteration 356100/ 476837 | consumed samples: 91161600 | consumed tokens: 186698956800 | elapsed time per iteration (s): 0.68 | learning rate: 4.752E-05 | global batch size: 256 | lm loss: 2.429353E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.341 | TFLOPs: 22.77 | 31: iteration 356200/ 476837 | consumed samples: 91187200 | consumed tokens: 186751385600 | elapsed time per iteration (s): 0.68 | learning rate: 4.748E-05 | global batch size: 256 | lm loss: 2.431850E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.270 | TFLOPs: 22.76 | 31: iteration 356300/ 476837 | consumed samples: 91212800 | consumed tokens: 186803814400 | elapsed time per iteration (s): 0.68 | learning rate: 4.744E-05 | global batch size: 256 | lm loss: 2.431714E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.622 | TFLOPs: 22.78 | 31: iteration 356400/ 476837 | consumed samples: 91238400 | consumed tokens: 186856243200 | elapsed time per iteration (s): 0.68 | learning rate: 4.739E-05 | global batch size: 256 | lm loss: 2.431778E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.568 | TFLOPs: 22.78 | 31: iteration 356500/ 476837 | consumed samples: 91264000 | consumed tokens: 186908672000 | elapsed time per iteration (s): 0.68 | learning rate: 4.735E-05 | global batch size: 256 | lm loss: 2.427925E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.848 | TFLOPs: 22.74 | 31: iteration 356600/ 476837 | consumed samples: 91289600 | consumed tokens: 186961100800 | elapsed time per iteration (s): 0.68 | learning rate: 4.731E-05 | global batch size: 256 | lm loss: 2.430291E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.196 | TFLOPs: 22.76 | 31: iteration 356700/ 476837 | consumed samples: 91315200 | consumed tokens: 187013529600 | elapsed time per iteration (s): 0.76 | learning rate: 4.726E-05 | global batch size: 256 | lm loss: 2.434905E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 336.527 | TFLOPs: 20.36 | 31: iteration 356800/ 476837 | consumed samples: 91340800 | consumed tokens: 187065958400 | elapsed time per iteration (s): 0.89 | learning rate: 4.722E-05 | global batch size: 256 | lm loss: 2.437717E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 288.768 | TFLOPs: 17.47 | 31: iteration 356900/ 476837 | consumed samples: 91366400 | consumed tokens: 187118387200 | elapsed time per iteration (s): 0.68 | learning rate: 4.718E-05 | global batch size: 256 | lm loss: 2.428155E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.726 | TFLOPs: 22.79 | 31: iteration 357000/ 476837 | consumed samples: 91392000 | consumed tokens: 187170816000 | elapsed time per iteration (s): 0.68 | learning rate: 4.714E-05 | global batch size: 256 | lm loss: 2.430217E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.348 | TFLOPs: 22.77 | 31: iteration 357100/ 476837 | consumed samples: 91417600 | consumed tokens: 187223244800 | elapsed time per iteration (s): 0.68 | learning rate: 4.709E-05 | global batch size: 256 | lm loss: 2.431469E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.192 | TFLOPs: 22.76 | 31: iteration 357200/ 476837 | consumed samples: 91443200 | consumed tokens: 187275673600 | elapsed time per iteration (s): 0.68 | learning rate: 4.705E-05 | global batch size: 256 | lm loss: 2.430719E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.250 | TFLOPs: 22.76 | 31: iteration 357300/ 476837 | consumed samples: 91468800 | consumed tokens: 187328102400 | elapsed time per iteration (s): 0.68 | learning rate: 4.701E-05 | global batch size: 256 | lm loss: 2.429666E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.060 | TFLOPs: 22.75 | 31: iteration 357400/ 476837 | consumed samples: 91494400 | consumed tokens: 187380531200 | elapsed time per iteration (s): 0.68 | learning rate: 4.696E-05 | global batch size: 256 | lm loss: 2.430908E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.766 | TFLOPs: 22.79 | 31: iteration 357500/ 476837 | consumed samples: 91520000 | consumed tokens: 187432960000 | elapsed time per iteration (s): 0.68 | learning rate: 4.692E-05 | global batch size: 256 | lm loss: 2.432304E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.524 | TFLOPs: 22.78 | 31: iteration 357600/ 476837 | consumed samples: 91545600 | consumed tokens: 187485388800 | elapsed time per iteration (s): 0.68 | learning rate: 4.688E-05 | global batch size: 256 | lm loss: 2.426227E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.202 | TFLOPs: 22.70 | 31: iteration 357700/ 476837 | consumed samples: 91571200 | consumed tokens: 187537817600 | elapsed time per iteration (s): 0.68 | learning rate: 4.684E-05 | global batch size: 256 | lm loss: 2.428567E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.784 | TFLOPs: 22.79 | 31: iteration 357800/ 476837 | consumed samples: 91596800 | consumed tokens: 187590246400 | elapsed time per iteration (s): 0.68 | learning rate: 4.679E-05 | global batch size: 256 | lm loss: 2.427788E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.747 | TFLOPs: 22.79 | 31: iteration 357900/ 476837 | consumed samples: 91622400 | consumed tokens: 187642675200 | elapsed time per iteration (s): 0.68 | learning rate: 4.675E-05 | global batch size: 256 | lm loss: 2.428097E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.175 | TFLOPs: 22.76 | 0: [2023-04-28 17:47:01,724] [INFO] [logging.py:68:log_dist] [Rank 0] step=358000, skipped=0, lr=[4.6708604680499405e-05, 4.6708604680499405e-05, 4.6708604680499405e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 358000/ 476837 | consumed samples: 91648000 | consumed tokens: 187695104000 | elapsed time per iteration (s): 0.68 | learning rate: 4.671E-05 | global batch size: 256 | lm loss: 2.428804E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.607 | TFLOPs: 22.66 | 0: steps: 358000 loss: 2.4274 iter time (s): 0.691 samples/sec: 370.346 31: iteration 358100/ 476837 | consumed samples: 91673600 | consumed tokens: 187747532800 | elapsed time per iteration (s): 0.68 | learning rate: 4.667E-05 | global batch size: 256 | lm loss: 2.428360E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.583 | TFLOPs: 22.78 | 31: iteration 358200/ 476837 | consumed samples: 91699200 | consumed tokens: 187799961600 | elapsed time per iteration (s): 0.68 | learning rate: 4.662E-05 | global batch size: 256 | lm loss: 2.432729E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.255 | TFLOPs: 22.76 | 31: iteration 358300/ 476837 | consumed samples: 91724800 | consumed tokens: 187852390400 | elapsed time per iteration (s): 0.68 | learning rate: 4.658E-05 | global batch size: 256 | lm loss: 2.427639E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.740 | TFLOPs: 22.73 | 31: iteration 358400/ 476837 | consumed samples: 91750400 | consumed tokens: 187904819200 | elapsed time per iteration (s): 0.68 | learning rate: 4.654E-05 | global batch size: 256 | lm loss: 2.423012E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.182 | TFLOPs: 22.76 | 31: iteration 358500/ 476837 | consumed samples: 91776000 | consumed tokens: 187957248000 | elapsed time per iteration (s): 0.68 | learning rate: 4.650E-05 | global batch size: 256 | lm loss: 2.425609E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.129 | TFLOPs: 22.75 | 31: iteration 358600/ 476837 | consumed samples: 91801600 | consumed tokens: 188009676800 | elapsed time per iteration (s): 0.68 | learning rate: 4.645E-05 | global batch size: 256 | lm loss: 2.428975E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.648 | TFLOPs: 22.79 | 31: iteration 358700/ 476837 | consumed samples: 91827200 | consumed tokens: 188062105600 | elapsed time per iteration (s): 0.68 | learning rate: 4.641E-05 | global batch size: 256 | lm loss: 2.431500E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.957 | TFLOPs: 22.74 | 31: iteration 358800/ 476837 | consumed samples: 91852800 | consumed tokens: 188114534400 | elapsed time per iteration (s): 0.72 | learning rate: 4.637E-05 | global batch size: 256 | lm loss: 2.429161E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.795 | TFLOPs: 21.65 | 31: iteration 358900/ 476837 | consumed samples: 91878400 | consumed tokens: 188166963200 | elapsed time per iteration (s): 0.78 | learning rate: 4.633E-05 | global batch size: 256 | lm loss: 2.425345E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 328.439 | TFLOPs: 19.87 | 31: iteration 359000/ 476837 | consumed samples: 91904000 | consumed tokens: 188219392000 | elapsed time per iteration (s): 0.68 | learning rate: 4.628E-05 | global batch size: 256 | lm loss: 2.429747E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.753 | TFLOPs: 22.79 | 31: iteration 359100/ 476837 | consumed samples: 91929600 | consumed tokens: 188271820800 | elapsed time per iteration (s): 0.68 | learning rate: 4.624E-05 | global batch size: 256 | lm loss: 2.428472E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.335 | TFLOPs: 22.77 | 31: iteration 359200/ 476837 | consumed samples: 91955200 | consumed tokens: 188324249600 | elapsed time per iteration (s): 0.68 | learning rate: 4.620E-05 | global batch size: 256 | lm loss: 2.426947E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.077 | TFLOPs: 22.75 | 31: iteration 359300/ 476837 | consumed samples: 91980800 | consumed tokens: 188376678400 | elapsed time per iteration (s): 0.68 | learning rate: 4.616E-05 | global batch size: 256 | lm loss: 2.425507E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.362 | TFLOPs: 22.77 | 31: iteration 359400/ 476837 | consumed samples: 92006400 | consumed tokens: 188429107200 | elapsed time per iteration (s): 0.68 | learning rate: 4.612E-05 | global batch size: 256 | lm loss: 2.428940E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.435 | TFLOPs: 22.77 | 31: iteration 359500/ 476837 | consumed samples: 92032000 | consumed tokens: 188481536000 | elapsed time per iteration (s): 0.68 | learning rate: 4.607E-05 | global batch size: 256 | lm loss: 2.426747E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.809 | TFLOPs: 22.80 | 31: iteration 359600/ 476837 | consumed samples: 92057600 | consumed tokens: 188533964800 | elapsed time per iteration (s): 0.79 | learning rate: 4.603E-05 | global batch size: 256 | lm loss: 2.425142E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 324.999 | TFLOPs: 19.66 | 31: iteration 359700/ 476837 | consumed samples: 92083200 | consumed tokens: 188586393600 | elapsed time per iteration (s): 0.68 | learning rate: 4.599E-05 | global batch size: 256 | lm loss: 2.424719E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.340 | TFLOPs: 22.77 | 31: iteration 359800/ 476837 | consumed samples: 92108800 | consumed tokens: 188638822400 | elapsed time per iteration (s): 0.68 | learning rate: 4.595E-05 | global batch size: 256 | lm loss: 2.422191E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.786 | TFLOPs: 22.79 | 31: iteration 359900/ 476837 | consumed samples: 92134400 | consumed tokens: 188691251200 | elapsed time per iteration (s): 0.68 | learning rate: 4.590E-05 | global batch size: 256 | lm loss: 2.426738E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.827 | TFLOPs: 22.68 | 0: [2023-04-28 18:10:28,296] [INFO] [logging.py:68:log_dist] [Rank 0] step=360000, skipped=0, lr=[4.586258996346054e-05, 4.586258996346054e-05, 4.586258996346054e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 360000/ 476837 | consumed samples: 92160000 | consumed tokens: 188743680000 | elapsed time per iteration (s): 0.90 | learning rate: 4.586E-05 | global batch size: 256 | lm loss: 2.426961E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 285.365 | TFLOPs: 17.26 | 0: steps: 360000 loss: 2.3728 iter time (s): 0.700 samples/sec: 365.808 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 360000 | lm loss value: 3.009724E+00 | lm loss PPL: 2.028180E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 360000 to checkpoints_1b1250b1b5 0: [2023-04-28 18:10:28,687] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step360000 is begin to save! 0: [2023-04-28 18:10:28,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_01-model_00-model_states.pt... 0: [2023-04-28 18:10:29,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_01-model_00-model_states.pt. 0: [2023-04-28 18:10:29,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_03-model_00-model_states.pt... 0: [2023-04-28 18:10:29,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_03-model_00-model_states.pt. 0: [2023-04-28 18:10:29,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_04-model_00-model_states.pt... 0: [2023-04-28 18:10:29,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_04-model_00-model_states.pt. 0: [2023-04-28 18:10:29,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_05-model_00-model_states.pt... 0: [2023-04-28 18:10:29,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_05-model_00-model_states.pt. 0: [2023-04-28 18:10:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_06-model_00-model_states.pt... 0: [2023-04-28 18:10:29,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_06-model_00-model_states.pt. 0: [2023-04-28 18:10:29,424] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_07-model_00-model_states.pt... 0: [2023-04-28 18:10:29,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_07-model_00-model_states.pt. 0: [2023-04-28 18:10:29,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_08-model_00-model_states.pt... 0: [2023-04-28 18:10:29,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_08-model_00-model_states.pt. 0: [2023-04-28 18:10:29,610] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_09-model_00-model_states.pt... 0: [2023-04-28 18:10:29,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_09-model_00-model_states.pt. 0: [2023-04-28 18:10:29,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_10-model_00-model_states.pt... 0: [2023-04-28 18:10:29,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_10-model_00-model_states.pt. 0: [2023-04-28 18:10:29,781] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_11-model_00-model_states.pt... 0: [2023-04-28 18:10:29,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_11-model_00-model_states.pt. 0: [2023-04-28 18:10:29,870] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_12-model_00-model_states.pt... 0: [2023-04-28 18:10:29,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_12-model_00-model_states.pt. 0: [2023-04-28 18:10:29,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_13-model_00-model_states.pt... 0: [2023-04-28 18:10:30,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_13-model_00-model_states.pt. 0: [2023-04-28 18:10:30,047] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_14-model_00-model_states.pt... 0: [2023-04-28 18:10:30,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_14-model_00-model_states.pt. 0: [2023-04-28 18:10:30,133] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_15-model_00-model_states.pt... 0: [2023-04-28 18:10:30,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_15-model_00-model_states.pt. 0: [2023-04-28 18:10:30,209] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_16-model_00-model_states.pt... 0: [2023-04-28 18:10:30,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_16-model_00-model_states.pt. 0: [2023-04-28 18:10:30,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_17-model_00-model_states.pt... 0: [2023-04-28 18:10:30,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_17-model_00-model_states.pt. 0: [2023-04-28 18:10:30,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_18-model_00-model_states.pt... 0: [2023-04-28 18:10:30,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_18-model_00-model_states.pt. 0: [2023-04-28 18:10:30,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_19-model_00-model_states.pt... 0: [2023-04-28 18:10:30,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_19-model_00-model_states.pt. 0: [2023-04-28 18:10:30,547] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_20-model_00-model_states.pt... 0: [2023-04-28 18:10:30,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_20-model_00-model_states.pt. 0: [2023-04-28 18:10:30,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_21-model_00-model_states.pt... 0: [2023-04-28 18:10:30,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_21-model_00-model_states.pt. 0: [2023-04-28 18:10:30,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_22-model_00-model_states.pt... 0: [2023-04-28 18:10:30,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_22-model_00-model_states.pt. 0: [2023-04-28 18:10:30,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_23-model_00-model_states.pt... 0: [2023-04-28 18:10:30,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_23-model_00-model_states.pt. 0: [2023-04-28 18:10:30,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_24-model_00-model_states.pt... 0: [2023-04-28 18:10:30,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_24-model_00-model_states.pt. 0: [2023-04-28 18:10:30,983] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_25-model_00-model_states.pt... 0: [2023-04-28 18:10:31,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_25-model_00-model_states.pt. 0: [2023-04-28 18:10:31,073] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_26-model_00-model_states.pt... 0: [2023-04-28 18:10:31,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_26-model_00-model_states.pt. 0: [2023-04-28 18:10:31,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_27-model_00-model_states.pt... 0: [2023-04-28 18:10:31,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_27-model_00-model_states.pt. 0: [2023-04-28 18:10:31,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_28-model_00-model_states.pt... 0: [2023-04-28 18:10:31,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_28-model_00-model_states.pt. 0: [2023-04-28 18:10:31,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/layer_30-model_00-model_states.pt... 0: [2023-04-28 18:10:31,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/layer_30-model_00-model_states.pt. 0: [2023-04-28 18:10:31,324] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step360000/mp_rank_00_model_states.pt 0: [2023-04-28 18:10:31,324] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/mp_rank_00_model_states.pt... 0: [2023-04-28 18:10:31,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/mp_rank_00_model_states.pt. 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 21: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 8: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 29: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 3: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 15: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 25: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 30: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 6: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 5: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 11: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 12: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 13: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 20: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 19: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 18: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 17: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 27: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 16: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 22: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 9: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 14: [2023-04-28 18:10:31,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 0: [2023-04-28 18:10:31,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-28 18:10:31,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 1: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 25: [2023-04-28 18:10:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 27: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 30: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 26: [2023-04-28 18:10:31,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-28 18:10:31,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-28 18:10:31,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 13: [2023-04-28 18:10:31,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 8: [2023-04-28 18:10:31,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 7: [2023-04-28 18:10:31,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-28 18:10:31,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-28 18:10:31,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 2: [2023-04-28 18:10:31,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-28 18:10:31,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-28 18:10:31,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 19: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-28 18:10:31,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-28 18:10:31,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 3: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 31: [2023-04-28 18:10:31,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-28 18:10:31,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-28 18:10:31,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-28 18:10:31,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 29: [2023-04-28 18:10:31,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 11: [2023-04-28 18:10:31,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 6: [2023-04-28 18:10:31,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 23: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: [2023-04-28 18:10:31,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-28 18:10:31,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 10: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 10: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 9: [2023-04-28 18:10:31,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 4: [2023-04-28 18:10:31,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-28 18:10:31,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-28 18:10:31,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 12: [2023-04-28 18:10:31,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 21: [2023-04-28 18:10:31,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 24: [2023-04-28 18:10:31,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 24: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 24: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 18: [2023-04-28 18:10:31,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 16: [2023-04-28 18:10:31,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 20: [2023-04-28 18:10:31,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 17: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 28: [2023-04-28 18:10:31,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-28 18:10:31,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 22: [2023-04-28 18:10:31,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 5: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 15: [2023-04-28 18:10:31,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 15: [2023-04-28 18:10:31,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step360000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 15: [2023-04-28 18:10:31,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step360000 is ready now! 0: successfully saved checkpoint at iteration 360000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 3153.94 31: iteration 360100/ 476837 | consumed samples: 92185600 | consumed tokens: 188796108800 | elapsed time per iteration (s): 0.71 | learning rate: 4.582E-05 | global batch size: 256 | lm loss: 2.427409E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.279 | TFLOPs: 21.74 | 31: iteration 360200/ 476837 | consumed samples: 92211200 | consumed tokens: 188848537600 | elapsed time per iteration (s): 0.68 | learning rate: 4.578E-05 | global batch size: 256 | lm loss: 2.428134E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.967 | TFLOPs: 22.68 | 31: iteration 360300/ 476837 | consumed samples: 92236800 | consumed tokens: 188900966400 | elapsed time per iteration (s): 0.93 | learning rate: 4.574E-05 | global batch size: 256 | lm loss: 2.429218E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 275.819 | TFLOPs: 16.69 | 31: iteration 360400/ 476837 | consumed samples: 92262400 | consumed tokens: 188953395200 | elapsed time per iteration (s): 0.71 | learning rate: 4.569E-05 | global batch size: 256 | lm loss: 2.424354E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.241 | TFLOPs: 21.73 | 31: iteration 360500/ 476837 | consumed samples: 92288000 | consumed tokens: 189005824000 | elapsed time per iteration (s): 0.68 | learning rate: 4.565E-05 | global batch size: 256 | lm loss: 2.423691E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.747 | TFLOPs: 22.67 | 31: iteration 360600/ 476837 | consumed samples: 92313600 | consumed tokens: 189058252800 | elapsed time per iteration (s): 0.68 | learning rate: 4.561E-05 | global batch size: 256 | lm loss: 2.430424E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.617 | TFLOPs: 22.72 | 31: iteration 360700/ 476837 | consumed samples: 92339200 | consumed tokens: 189110681600 | elapsed time per iteration (s): 0.68 | learning rate: 4.557E-05 | global batch size: 256 | lm loss: 2.425596E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.209 | TFLOPs: 22.64 | 31: iteration 360800/ 476837 | consumed samples: 92364800 | consumed tokens: 189163110400 | elapsed time per iteration (s): 0.68 | learning rate: 4.553E-05 | global batch size: 256 | lm loss: 2.423831E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.952 | TFLOPs: 22.74 | 31: iteration 360900/ 476837 | consumed samples: 92390400 | consumed tokens: 189215539200 | elapsed time per iteration (s): 0.68 | learning rate: 4.549E-05 | global batch size: 256 | lm loss: 2.427935E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.548 | TFLOPs: 22.66 | 31: iteration 361000/ 476837 | consumed samples: 92416000 | consumed tokens: 189267968000 | elapsed time per iteration (s): 0.68 | learning rate: 4.544E-05 | global batch size: 256 | lm loss: 2.424387E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.415 | TFLOPs: 22.65 | 31: iteration 361100/ 476837 | consumed samples: 92441600 | consumed tokens: 189320396800 | elapsed time per iteration (s): 0.68 | learning rate: 4.540E-05 | global batch size: 256 | lm loss: 2.423781E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.149 | TFLOPs: 22.70 | 31: iteration 361200/ 476837 | consumed samples: 92467200 | consumed tokens: 189372825600 | elapsed time per iteration (s): 0.68 | learning rate: 4.536E-05 | global batch size: 256 | lm loss: 2.425657E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.460 | TFLOPs: 22.77 | 31: iteration 361300/ 476837 | consumed samples: 92492800 | consumed tokens: 189425254400 | elapsed time per iteration (s): 0.68 | learning rate: 4.532E-05 | global batch size: 256 | lm loss: 2.425064E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.507 | TFLOPs: 22.72 | 31: iteration 361400/ 476837 | consumed samples: 92518400 | consumed tokens: 189477683200 | elapsed time per iteration (s): 0.68 | learning rate: 4.528E-05 | global batch size: 256 | lm loss: 2.427048E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.346 | TFLOPs: 22.77 | 31: iteration 361500/ 476837 | consumed samples: 92544000 | consumed tokens: 189530112000 | elapsed time per iteration (s): 0.68 | learning rate: 4.524E-05 | global batch size: 256 | lm loss: 2.423800E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.678 | TFLOPs: 22.79 | 31: iteration 361600/ 476837 | consumed samples: 92569600 | consumed tokens: 189582540800 | elapsed time per iteration (s): 0.68 | learning rate: 4.519E-05 | global batch size: 256 | lm loss: 2.425625E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.207 | TFLOPs: 22.76 | 31: iteration 361700/ 476837 | consumed samples: 92595200 | consumed tokens: 189634969600 | elapsed time per iteration (s): 0.68 | learning rate: 4.515E-05 | global batch size: 256 | lm loss: 2.431480E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.502 | TFLOPs: 22.78 | 31: iteration 361800/ 476837 | consumed samples: 92620800 | consumed tokens: 189687398400 | elapsed time per iteration (s): 0.68 | learning rate: 4.511E-05 | global batch size: 256 | lm loss: 2.425657E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.625 | TFLOPs: 22.78 | 31: iteration 361900/ 476837 | consumed samples: 92646400 | consumed tokens: 189739827200 | elapsed time per iteration (s): 0.68 | learning rate: 4.507E-05 | global batch size: 256 | lm loss: 2.420728E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.684 | TFLOPs: 22.79 | 0: [2023-04-28 18:33:41,946] [INFO] [logging.py:68:log_dist] [Rank 0] step=362000, skipped=0, lr=[4.502793723028153e-05, 4.502793723028153e-05, 4.502793723028153e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 362000/ 476837 | consumed samples: 92672000 | consumed tokens: 189792256000 | elapsed time per iteration (s): 0.68 | learning rate: 4.503E-05 | global batch size: 256 | lm loss: 2.422117E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.603 | TFLOPs: 22.78 | 0: steps: 362000 loss: 2.3821 iter time (s): 0.693 samples/sec: 369.433 31: iteration 362100/ 476837 | consumed samples: 92697600 | consumed tokens: 189844684800 | elapsed time per iteration (s): 0.68 | learning rate: 4.499E-05 | global batch size: 256 | lm loss: 2.429164E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.138 | TFLOPs: 22.76 | 31: iteration 362200/ 476837 | consumed samples: 92723200 | consumed tokens: 189897113600 | elapsed time per iteration (s): 0.68 | learning rate: 4.495E-05 | global batch size: 256 | lm loss: 2.422261E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.077 | TFLOPs: 22.75 | 31: iteration 362300/ 476837 | consumed samples: 92748800 | consumed tokens: 189949542400 | elapsed time per iteration (s): 0.78 | learning rate: 4.490E-05 | global batch size: 256 | lm loss: 2.421019E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 326.908 | TFLOPs: 19.78 | 31: iteration 362400/ 476837 | consumed samples: 92774400 | consumed tokens: 190001971200 | elapsed time per iteration (s): 0.68 | learning rate: 4.486E-05 | global batch size: 256 | lm loss: 2.424677E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.802 | TFLOPs: 22.74 | 31: iteration 362500/ 476837 | consumed samples: 92800000 | consumed tokens: 190054400000 | elapsed time per iteration (s): 0.68 | learning rate: 4.482E-05 | global batch size: 256 | lm loss: 2.427492E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.358 | TFLOPs: 22.77 | 31: iteration 362600/ 476837 | consumed samples: 92825600 | consumed tokens: 190106828800 | elapsed time per iteration (s): 0.68 | learning rate: 4.478E-05 | global batch size: 256 | lm loss: 2.424310E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.404 | TFLOPs: 22.77 | 31: iteration 362700/ 476837 | consumed samples: 92851200 | consumed tokens: 190159257600 | elapsed time per iteration (s): 0.70 | learning rate: 4.474E-05 | global batch size: 256 | lm loss: 2.428536E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.358 | TFLOPs: 22.22 | 31: iteration 362800/ 476837 | consumed samples: 92876800 | consumed tokens: 190211686400 | elapsed time per iteration (s): 0.68 | learning rate: 4.470E-05 | global batch size: 256 | lm loss: 2.427415E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.180 | TFLOPs: 22.76 | 31: iteration 362900/ 476837 | consumed samples: 92902400 | consumed tokens: 190264115200 | elapsed time per iteration (s): 0.68 | learning rate: 4.466E-05 | global batch size: 256 | lm loss: 2.422674E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.566 | TFLOPs: 22.78 | 31: iteration 363000/ 476837 | consumed samples: 92928000 | consumed tokens: 190316544000 | elapsed time per iteration (s): 0.68 | learning rate: 4.461E-05 | global batch size: 256 | lm loss: 2.424570E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.755 | TFLOPs: 22.79 | 31: iteration 363100/ 476837 | consumed samples: 92953600 | consumed tokens: 190368972800 | elapsed time per iteration (s): 0.68 | learning rate: 4.457E-05 | global batch size: 256 | lm loss: 2.424348E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.766 | TFLOPs: 22.79 | 31: iteration 363200/ 476837 | consumed samples: 92979200 | consumed tokens: 190421401600 | elapsed time per iteration (s): 0.68 | learning rate: 4.453E-05 | global batch size: 256 | lm loss: 2.422442E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.536 | TFLOPs: 22.78 | 31: iteration 363300/ 476837 | consumed samples: 93004800 | consumed tokens: 190473830400 | elapsed time per iteration (s): 0.68 | learning rate: 4.449E-05 | global batch size: 256 | lm loss: 2.425039E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.901 | TFLOPs: 22.80 | 31: iteration 363400/ 476837 | consumed samples: 93030400 | consumed tokens: 190526259200 | elapsed time per iteration (s): 0.68 | learning rate: 4.445E-05 | global batch size: 256 | lm loss: 2.421216E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.984 | TFLOPs: 22.81 | 31: iteration 363500/ 476837 | consumed samples: 93056000 | consumed tokens: 190578688000 | elapsed time per iteration (s): 0.68 | learning rate: 4.441E-05 | global batch size: 256 | lm loss: 2.422755E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.520 | TFLOPs: 22.78 | 31: iteration 363600/ 476837 | consumed samples: 93081600 | consumed tokens: 190631116800 | elapsed time per iteration (s): 0.68 | learning rate: 4.437E-05 | global batch size: 256 | lm loss: 2.418341E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.134 | TFLOPs: 22.69 | 31: iteration 363700/ 476837 | consumed samples: 93107200 | consumed tokens: 190683545600 | elapsed time per iteration (s): 0.68 | learning rate: 4.433E-05 | global batch size: 256 | lm loss: 2.425411E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.965 | TFLOPs: 22.81 | 31: iteration 363800/ 476837 | consumed samples: 93132800 | consumed tokens: 190735974400 | elapsed time per iteration (s): 0.70 | learning rate: 4.429E-05 | global batch size: 256 | lm loss: 2.422941E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 364.668 | TFLOPs: 22.06 | 31: iteration 363900/ 476837 | consumed samples: 93158400 | consumed tokens: 190788403200 | elapsed time per iteration (s): 0.95 | learning rate: 4.425E-05 | global batch size: 256 | lm loss: 2.428931E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 268.570 | TFLOPs: 16.25 | 0: [2023-04-28 18:57:04,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=364000, skipped=0, lr=[4.4204794340236395e-05, 4.4204794340236395e-05, 4.4204794340236395e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 364000/ 476837 | consumed samples: 93184000 | consumed tokens: 190840832000 | elapsed time per iteration (s): 0.69 | learning rate: 4.420E-05 | global batch size: 256 | lm loss: 2.427325E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.242 | TFLOPs: 22.58 | 0: steps: 364000 loss: 2.4588 iter time (s): 0.698 samples/sec: 366.808 31: iteration 364100/ 476837 | consumed samples: 93209600 | consumed tokens: 190893260800 | elapsed time per iteration (s): 0.68 | learning rate: 4.416E-05 | global batch size: 256 | lm loss: 2.425906E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.697 | TFLOPs: 22.79 | 31: iteration 364200/ 476837 | consumed samples: 93235200 | consumed tokens: 190945689600 | elapsed time per iteration (s): 0.68 | learning rate: 4.412E-05 | global batch size: 256 | lm loss: 2.423643E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.653 | TFLOPs: 22.79 | 31: iteration 364300/ 476837 | consumed samples: 93260800 | consumed tokens: 190998118400 | elapsed time per iteration (s): 0.68 | learning rate: 4.408E-05 | global batch size: 256 | lm loss: 2.425747E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.835 | TFLOPs: 22.80 | 31: iteration 364400/ 476837 | consumed samples: 93286400 | consumed tokens: 191050547200 | elapsed time per iteration (s): 0.68 | learning rate: 4.404E-05 | global batch size: 256 | lm loss: 2.425777E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.001 | TFLOPs: 22.75 | 31: iteration 364500/ 476837 | consumed samples: 93312000 | consumed tokens: 191102976000 | elapsed time per iteration (s): 0.68 | learning rate: 4.400E-05 | global batch size: 256 | lm loss: 2.425449E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.207 | TFLOPs: 22.76 | 31: iteration 364600/ 476837 | consumed samples: 93337600 | consumed tokens: 191155404800 | elapsed time per iteration (s): 0.68 | learning rate: 4.396E-05 | global batch size: 256 | lm loss: 2.420082E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.863 | TFLOPs: 22.80 | 31: iteration 364700/ 476837 | consumed samples: 93363200 | consumed tokens: 191207833600 | elapsed time per iteration (s): 0.68 | learning rate: 4.392E-05 | global batch size: 256 | lm loss: 2.422880E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.102 | TFLOPs: 22.75 | 31: iteration 364800/ 476837 | consumed samples: 93388800 | consumed tokens: 191260262400 | elapsed time per iteration (s): 0.68 | learning rate: 4.388E-05 | global batch size: 256 | lm loss: 2.417460E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.944 | TFLOPs: 22.80 | 31: iteration 364900/ 476837 | consumed samples: 93414400 | consumed tokens: 191312691200 | elapsed time per iteration (s): 0.68 | learning rate: 4.384E-05 | global batch size: 256 | lm loss: 2.423120E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.146 | TFLOPs: 22.76 | 31: iteration 365000/ 476837 | consumed samples: 93440000 | consumed tokens: 191365120000 | elapsed time per iteration (s): 0.68 | learning rate: 4.380E-05 | global batch size: 256 | lm loss: 2.427125E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.897 | TFLOPs: 22.80 | 31: iteration 365100/ 476837 | consumed samples: 93465600 | consumed tokens: 191417548800 | elapsed time per iteration (s): 0.68 | learning rate: 4.376E-05 | global batch size: 256 | lm loss: 2.425418E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.515 | TFLOPs: 22.78 | 31: iteration 365200/ 476837 | consumed samples: 93491200 | consumed tokens: 191469977600 | elapsed time per iteration (s): 0.68 | learning rate: 4.372E-05 | global batch size: 256 | lm loss: 2.423164E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.887 | TFLOPs: 22.80 | 31: iteration 365300/ 476837 | consumed samples: 93516800 | consumed tokens: 191522406400 | elapsed time per iteration (s): 0.68 | learning rate: 4.368E-05 | global batch size: 256 | lm loss: 2.426369E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.095 | TFLOPs: 22.75 | 31: iteration 365400/ 476837 | consumed samples: 93542400 | consumed tokens: 191574835200 | elapsed time per iteration (s): 0.68 | learning rate: 4.364E-05 | global batch size: 256 | lm loss: 2.426488E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.090 | TFLOPs: 22.75 | 31: iteration 365500/ 476837 | consumed samples: 93568000 | consumed tokens: 191627264000 | elapsed time per iteration (s): 0.68 | learning rate: 4.360E-05 | global batch size: 256 | lm loss: 2.421438E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.435 | TFLOPs: 22.77 | 31: iteration 365600/ 476837 | consumed samples: 93593600 | consumed tokens: 191679692800 | elapsed time per iteration (s): 0.68 | learning rate: 4.355E-05 | global batch size: 256 | lm loss: 2.422012E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.840 | TFLOPs: 22.80 | 31: iteration 365700/ 476837 | consumed samples: 93619200 | consumed tokens: 191732121600 | elapsed time per iteration (s): 0.68 | learning rate: 4.351E-05 | global batch size: 256 | lm loss: 2.420740E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.228 | TFLOPs: 22.70 | 31: iteration 365800/ 476837 | consumed samples: 93644800 | consumed tokens: 191784550400 | elapsed time per iteration (s): 0.68 | learning rate: 4.347E-05 | global batch size: 256 | lm loss: 2.422584E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.842 | TFLOPs: 22.80 | 31: iteration 365900/ 476837 | consumed samples: 93670400 | consumed tokens: 191836979200 | elapsed time per iteration (s): 0.68 | learning rate: 4.343E-05 | global batch size: 256 | lm loss: 2.422669E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.831 | TFLOPs: 22.68 | 0: [2023-04-28 19:19:44,729] [INFO] [logging.py:68:log_dist] [Rank 0] step=366000, skipped=0, lr=[4.339330711362286e-05, 4.339330711362286e-05, 4.339330711362286e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 366000/ 476837 | consumed samples: 93696000 | consumed tokens: 191889408000 | elapsed time per iteration (s): 0.68 | learning rate: 4.339E-05 | global batch size: 256 | lm loss: 2.422192E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.567 | TFLOPs: 22.72 | 0: steps: 366000 loss: 2.4404 iter time (s): 0.677 samples/sec: 378.239 31: iteration 366100/ 476837 | consumed samples: 93721600 | consumed tokens: 191941836800 | elapsed time per iteration (s): 0.68 | learning rate: 4.335E-05 | global batch size: 256 | lm loss: 2.418359E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.914 | TFLOPs: 22.74 | 31: iteration 366200/ 476837 | consumed samples: 93747200 | consumed tokens: 191994265600 | elapsed time per iteration (s): 0.68 | learning rate: 4.331E-05 | global batch size: 256 | lm loss: 2.420493E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.270 | TFLOPs: 22.70 | 31: iteration 366300/ 476837 | consumed samples: 93772800 | consumed tokens: 192046694400 | elapsed time per iteration (s): 0.68 | learning rate: 4.327E-05 | global batch size: 256 | lm loss: 2.423335E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.336 | TFLOPs: 22.65 | 31: iteration 366400/ 476837 | consumed samples: 93798400 | consumed tokens: 192099123200 | elapsed time per iteration (s): 0.68 | learning rate: 4.323E-05 | global batch size: 256 | lm loss: 2.421189E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.464 | TFLOPs: 22.71 | 31: iteration 366500/ 476837 | consumed samples: 93824000 | consumed tokens: 192151552000 | elapsed time per iteration (s): 0.68 | learning rate: 4.319E-05 | global batch size: 256 | lm loss: 2.416538E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.920 | TFLOPs: 22.74 | 31: iteration 366600/ 476837 | consumed samples: 93849600 | consumed tokens: 192203980800 | elapsed time per iteration (s): 0.78 | learning rate: 4.315E-05 | global batch size: 256 | lm loss: 2.425165E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 327.140 | TFLOPs: 19.79 | 31: iteration 366700/ 476837 | consumed samples: 93875200 | consumed tokens: 192256409600 | elapsed time per iteration (s): 0.78 | learning rate: 4.311E-05 | global batch size: 256 | lm loss: 2.421778E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 327.328 | TFLOPs: 19.80 | 31: iteration 366800/ 476837 | consumed samples: 93900800 | consumed tokens: 192308838400 | elapsed time per iteration (s): 0.71 | learning rate: 4.307E-05 | global batch size: 256 | lm loss: 2.419494E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.723 | TFLOPs: 21.88 | 31: iteration 366900/ 476837 | consumed samples: 93926400 | consumed tokens: 192361267200 | elapsed time per iteration (s): 0.68 | learning rate: 4.303E-05 | global batch size: 256 | lm loss: 2.424352E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.624 | TFLOPs: 22.78 | 31: iteration 367000/ 476837 | consumed samples: 93952000 | consumed tokens: 192413696000 | elapsed time per iteration (s): 0.68 | learning rate: 4.299E-05 | global batch size: 256 | lm loss: 2.417278E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.005 | TFLOPs: 22.75 | 31: iteration 367100/ 476837 | consumed samples: 93977600 | consumed tokens: 192466124800 | elapsed time per iteration (s): 0.68 | learning rate: 4.295E-05 | global batch size: 256 | lm loss: 2.422857E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.742 | TFLOPs: 22.79 | 31: iteration 367200/ 476837 | consumed samples: 94003200 | consumed tokens: 192518553600 | elapsed time per iteration (s): 0.68 | learning rate: 4.291E-05 | global batch size: 256 | lm loss: 2.418705E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.698 | TFLOPs: 22.79 | 31: iteration 367300/ 476837 | consumed samples: 94028800 | consumed tokens: 192570982400 | elapsed time per iteration (s): 0.68 | learning rate: 4.287E-05 | global batch size: 256 | lm loss: 2.425271E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.094 | TFLOPs: 22.75 | 31: iteration 367400/ 476837 | consumed samples: 94054400 | consumed tokens: 192623411200 | elapsed time per iteration (s): 0.68 | learning rate: 4.283E-05 | global batch size: 256 | lm loss: 2.422301E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.557 | TFLOPs: 22.66 | 31: iteration 367500/ 476837 | consumed samples: 94080000 | consumed tokens: 192675840000 | elapsed time per iteration (s): 0.96 | learning rate: 4.279E-05 | global batch size: 256 | lm loss: 2.423531E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 266.248 | TFLOPs: 16.11 | 31: iteration 367600/ 476837 | consumed samples: 94105600 | consumed tokens: 192728268800 | elapsed time per iteration (s): 0.71 | learning rate: 4.275E-05 | global batch size: 256 | lm loss: 2.419211E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 361.654 | TFLOPs: 21.88 | 31: iteration 367700/ 476837 | consumed samples: 94131200 | consumed tokens: 192780697600 | elapsed time per iteration (s): 0.68 | learning rate: 4.271E-05 | global batch size: 256 | lm loss: 2.421516E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.228 | TFLOPs: 22.76 | 31: iteration 367800/ 476837 | consumed samples: 94156800 | consumed tokens: 192833126400 | elapsed time per iteration (s): 0.68 | learning rate: 4.267E-05 | global batch size: 256 | lm loss: 2.422461E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.757 | TFLOPs: 22.79 | 31: iteration 367900/ 476837 | consumed samples: 94182400 | consumed tokens: 192885555200 | elapsed time per iteration (s): 0.68 | learning rate: 4.263E-05 | global batch size: 256 | lm loss: 2.417639E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.537 | TFLOPs: 22.72 | 0: [2023-04-28 19:43:20,495] [INFO] [logging.py:68:log_dist] [Rank 0] step=368000, skipped=0, lr=[4.259361930593036e-05, 4.259361930593036e-05, 4.259361930593036e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 368000/ 476837 | consumed samples: 94208000 | consumed tokens: 192937984000 | elapsed time per iteration (s): 0.68 | learning rate: 4.259E-05 | global batch size: 256 | lm loss: 2.423177E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.123 | TFLOPs: 22.75 | 0: steps: 368000 loss: 2.3951 iter time (s): 0.704 samples/sec: 363.475 31: iteration 368100/ 476837 | consumed samples: 94233600 | consumed tokens: 192990412800 | elapsed time per iteration (s): 0.68 | learning rate: 4.255E-05 | global batch size: 256 | lm loss: 2.421602E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.652 | TFLOPs: 22.79 | 31: iteration 368200/ 476837 | consumed samples: 94259200 | consumed tokens: 193042841600 | elapsed time per iteration (s): 0.68 | learning rate: 4.251E-05 | global batch size: 256 | lm loss: 2.426764E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.956 | TFLOPs: 22.74 | 31: iteration 368300/ 476837 | consumed samples: 94284800 | consumed tokens: 193095270400 | elapsed time per iteration (s): 0.68 | learning rate: 4.247E-05 | global batch size: 256 | lm loss: 2.418459E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.741 | TFLOPs: 22.79 | 31: iteration 368400/ 476837 | consumed samples: 94310400 | consumed tokens: 193147699200 | elapsed time per iteration (s): 0.68 | learning rate: 4.244E-05 | global batch size: 256 | lm loss: 2.417019E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.708 | TFLOPs: 22.79 | 31: iteration 368500/ 476837 | consumed samples: 94336000 | consumed tokens: 193200128000 | elapsed time per iteration (s): 0.68 | learning rate: 4.240E-05 | global batch size: 256 | lm loss: 2.424431E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.311 | TFLOPs: 22.77 | 31: iteration 368600/ 476837 | consumed samples: 94361600 | consumed tokens: 193252556800 | elapsed time per iteration (s): 0.68 | learning rate: 4.236E-05 | global batch size: 256 | lm loss: 2.423580E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.750 | TFLOPs: 22.79 | 31: iteration 368700/ 476837 | consumed samples: 94387200 | consumed tokens: 193304985600 | elapsed time per iteration (s): 0.68 | learning rate: 4.232E-05 | global batch size: 256 | lm loss: 2.419229E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.767 | TFLOPs: 22.79 | 31: iteration 368800/ 476837 | consumed samples: 94412800 | consumed tokens: 193357414400 | elapsed time per iteration (s): 0.68 | learning rate: 4.228E-05 | global batch size: 256 | lm loss: 2.421322E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.104 | TFLOPs: 22.75 | 31: iteration 368900/ 476837 | consumed samples: 94438400 | consumed tokens: 193409843200 | elapsed time per iteration (s): 0.68 | learning rate: 4.224E-05 | global batch size: 256 | lm loss: 2.419843E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.377 | TFLOPs: 22.77 | 31: iteration 369000/ 476837 | consumed samples: 94464000 | consumed tokens: 193462272000 | elapsed time per iteration (s): 0.68 | learning rate: 4.220E-05 | global batch size: 256 | lm loss: 2.419605E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.744 | TFLOPs: 22.79 | 31: iteration 369100/ 476837 | consumed samples: 94489600 | consumed tokens: 193514700800 | elapsed time per iteration (s): 0.68 | learning rate: 4.216E-05 | global batch size: 256 | lm loss: 2.416219E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.677 | TFLOPs: 22.79 | 31: iteration 369200/ 476837 | consumed samples: 94515200 | consumed tokens: 193567129600 | elapsed time per iteration (s): 0.68 | learning rate: 4.212E-05 | global batch size: 256 | lm loss: 2.421462E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.293 | TFLOPs: 22.70 | 31: iteration 369300/ 476837 | consumed samples: 94540800 | consumed tokens: 193619558400 | elapsed time per iteration (s): 0.68 | learning rate: 4.208E-05 | global batch size: 256 | lm loss: 2.417191E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.354 | TFLOPs: 22.77 | 31: iteration 369400/ 476837 | consumed samples: 94566400 | consumed tokens: 193671987200 | elapsed time per iteration (s): 0.68 | learning rate: 4.204E-05 | global batch size: 256 | lm loss: 2.415636E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.590 | TFLOPs: 22.72 | 31: iteration 369500/ 476837 | consumed samples: 94592000 | consumed tokens: 193724416000 | elapsed time per iteration (s): 0.68 | learning rate: 4.200E-05 | global batch size: 256 | lm loss: 2.419953E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.919 | TFLOPs: 22.68 | 31: iteration 369600/ 476837 | consumed samples: 94617600 | consumed tokens: 193776844800 | elapsed time per iteration (s): 0.68 | learning rate: 4.196E-05 | global batch size: 256 | lm loss: 2.417974E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.321 | TFLOPs: 22.77 | 31: iteration 369700/ 476837 | consumed samples: 94643200 | consumed tokens: 193829273600 | elapsed time per iteration (s): 0.68 | learning rate: 4.192E-05 | global batch size: 256 | lm loss: 2.419100E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.608 | TFLOPs: 22.78 | 31: iteration 369800/ 476837 | consumed samples: 94668800 | consumed tokens: 193881702400 | elapsed time per iteration (s): 0.68 | learning rate: 4.188E-05 | global batch size: 256 | lm loss: 2.416649E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.586 | TFLOPs: 22.78 | 31: iteration 369900/ 476837 | consumed samples: 94694400 | consumed tokens: 193934131200 | elapsed time per iteration (s): 0.68 | learning rate: 4.184E-05 | global batch size: 256 | lm loss: 2.421285E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.577 | TFLOPs: 22.78 | 0: [2023-04-28 20:06:01,064] [INFO] [logging.py:68:log_dist] [Rank 0] step=370000, skipped=0, lr=[4.1805872582373684e-05, 4.1805872582373684e-05, 4.1805872582373684e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 370000/ 476837 | consumed samples: 94720000 | consumed tokens: 193986560000 | elapsed time per iteration (s): 0.68 | learning rate: 4.181E-05 | global batch size: 256 | lm loss: 2.413317E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.238 | TFLOPs: 22.76 | 0: steps: 370000 loss: 2.4288 iter time (s): 0.677 samples/sec: 378.255 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 370000 | lm loss value: 2.983901E+00 | lm loss PPL: 1.976476E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 370100/ 476837 | consumed samples: 94745600 | consumed tokens: 194038988800 | elapsed time per iteration (s): 0.68 | learning rate: 4.177E-05 | global batch size: 256 | lm loss: 2.420462E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.602 | TFLOPs: 22.66 | 31: iteration 370200/ 476837 | consumed samples: 94771200 | consumed tokens: 194091417600 | elapsed time per iteration (s): 0.68 | learning rate: 4.173E-05 | global batch size: 256 | lm loss: 2.417520E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.743 | TFLOPs: 22.79 | 31: iteration 370300/ 476837 | consumed samples: 94796800 | consumed tokens: 194143846400 | elapsed time per iteration (s): 0.68 | learning rate: 4.169E-05 | global batch size: 256 | lm loss: 2.422106E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.713 | TFLOPs: 22.79 | 31: iteration 370400/ 476837 | consumed samples: 94822400 | consumed tokens: 194196275200 | elapsed time per iteration (s): 0.68 | learning rate: 4.165E-05 | global batch size: 256 | lm loss: 2.421055E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.965 | TFLOPs: 22.74 | 31: iteration 370500/ 476837 | consumed samples: 94848000 | consumed tokens: 194248704000 | elapsed time per iteration (s): 0.68 | learning rate: 4.161E-05 | global batch size: 256 | lm loss: 2.418170E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.419 | TFLOPs: 22.77 | 31: iteration 370600/ 476837 | consumed samples: 94873600 | consumed tokens: 194301132800 | elapsed time per iteration (s): 0.68 | learning rate: 4.157E-05 | global batch size: 256 | lm loss: 2.415728E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.927 | TFLOPs: 22.74 | 31: iteration 370700/ 476837 | consumed samples: 94899200 | consumed tokens: 194353561600 | elapsed time per iteration (s): 0.68 | learning rate: 4.153E-05 | global batch size: 256 | lm loss: 2.419001E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.566 | TFLOPs: 22.78 | 31: iteration 370800/ 476837 | consumed samples: 94924800 | consumed tokens: 194405990400 | elapsed time per iteration (s): 0.68 | learning rate: 4.149E-05 | global batch size: 256 | lm loss: 2.422250E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.715 | TFLOPs: 22.79 | 31: iteration 370900/ 476837 | consumed samples: 94950400 | consumed tokens: 194458419200 | elapsed time per iteration (s): 0.68 | learning rate: 4.146E-05 | global batch size: 256 | lm loss: 2.415925E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.382 | TFLOPs: 22.77 | 31: iteration 371000/ 476837 | consumed samples: 94976000 | consumed tokens: 194510848000 | elapsed time per iteration (s): 0.68 | learning rate: 4.142E-05 | global batch size: 256 | lm loss: 2.418296E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.599 | TFLOPs: 22.78 | 31: iteration 371100/ 476837 | consumed samples: 95001600 | consumed tokens: 194563276800 | elapsed time per iteration (s): 0.80 | learning rate: 4.138E-05 | global batch size: 256 | lm loss: 2.418395E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 319.308 | TFLOPs: 19.32 | 31: iteration 371200/ 476837 | consumed samples: 95027200 | consumed tokens: 194615705600 | elapsed time per iteration (s): 0.89 | learning rate: 4.134E-05 | global batch size: 256 | lm loss: 2.419614E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 288.857 | TFLOPs: 17.48 | 31: iteration 371300/ 476837 | consumed samples: 95052800 | consumed tokens: 194668134400 | elapsed time per iteration (s): 0.69 | learning rate: 4.130E-05 | global batch size: 256 | lm loss: 2.415050E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.328 | TFLOPs: 22.59 | 31: iteration 371400/ 476837 | consumed samples: 95078400 | consumed tokens: 194720563200 | elapsed time per iteration (s): 0.68 | learning rate: 4.126E-05 | global batch size: 256 | lm loss: 2.414393E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.167 | TFLOPs: 22.76 | 31: iteration 371500/ 476837 | consumed samples: 95104000 | consumed tokens: 194772992000 | elapsed time per iteration (s): 0.68 | learning rate: 4.122E-05 | global batch size: 256 | lm loss: 2.417144E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.077 | TFLOPs: 22.75 | 31: iteration 371600/ 476837 | consumed samples: 95129600 | consumed tokens: 194825420800 | elapsed time per iteration (s): 0.68 | learning rate: 4.118E-05 | global batch size: 256 | lm loss: 2.419807E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.242 | TFLOPs: 22.70 | 31: iteration 371700/ 476837 | consumed samples: 95155200 | consumed tokens: 194877849600 | elapsed time per iteration (s): 0.68 | learning rate: 4.115E-05 | global batch size: 256 | lm loss: 2.414797E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.217 | TFLOPs: 22.70 | 31: iteration 371800/ 476837 | consumed samples: 95180800 | consumed tokens: 194930278400 | elapsed time per iteration (s): 0.68 | learning rate: 4.111E-05 | global batch size: 256 | lm loss: 2.418521E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.527 | TFLOPs: 22.72 | 31: iteration 371900/ 476837 | consumed samples: 95206400 | consumed tokens: 194982707200 | elapsed time per iteration (s): 0.68 | learning rate: 4.107E-05 | global batch size: 256 | lm loss: 2.418136E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.920 | TFLOPs: 22.68 | 0: [2023-04-28 20:29:15,983] [INFO] [logging.py:68:log_dist] [Rank 0] step=372000, skipped=0, lr=[4.10302064927967e-05, 4.10302064927967e-05, 4.10302064927967e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 372000/ 476837 | consumed samples: 95232000 | consumed tokens: 195035136000 | elapsed time per iteration (s): 0.68 | learning rate: 4.103E-05 | global batch size: 256 | lm loss: 2.416607E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.680 | TFLOPs: 22.73 | 0: steps: 372000 loss: 2.4626 iter time (s): 0.694 samples/sec: 368.899 31: iteration 372100/ 476837 | consumed samples: 95257600 | consumed tokens: 195087564800 | elapsed time per iteration (s): 0.68 | learning rate: 4.099E-05 | global batch size: 256 | lm loss: 2.417390E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.153 | TFLOPs: 22.76 | 31: iteration 372200/ 476837 | consumed samples: 95283200 | consumed tokens: 195139993600 | elapsed time per iteration (s): 0.68 | learning rate: 4.095E-05 | global batch size: 256 | lm loss: 2.416191E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.821 | TFLOPs: 22.68 | 31: iteration 372300/ 476837 | consumed samples: 95308800 | consumed tokens: 195192422400 | elapsed time per iteration (s): 0.68 | learning rate: 4.091E-05 | global batch size: 256 | lm loss: 2.419111E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.950 | TFLOPs: 22.74 | 31: iteration 372400/ 476837 | consumed samples: 95334400 | consumed tokens: 195244851200 | elapsed time per iteration (s): 0.68 | learning rate: 4.088E-05 | global batch size: 256 | lm loss: 2.419849E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.577 | TFLOPs: 22.72 | 31: iteration 372500/ 476837 | consumed samples: 95360000 | consumed tokens: 195297280000 | elapsed time per iteration (s): 0.68 | learning rate: 4.084E-05 | global batch size: 256 | lm loss: 2.420641E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.652 | TFLOPs: 22.79 | 31: iteration 372600/ 476837 | consumed samples: 95385600 | consumed tokens: 195349708800 | elapsed time per iteration (s): 0.68 | learning rate: 4.080E-05 | global batch size: 256 | lm loss: 2.416109E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.712 | TFLOPs: 22.73 | 31: iteration 372700/ 476837 | consumed samples: 95411200 | consumed tokens: 195402137600 | elapsed time per iteration (s): 0.68 | learning rate: 4.076E-05 | global batch size: 256 | lm loss: 2.418924E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.264 | TFLOPs: 22.76 | 31: iteration 372800/ 476837 | consumed samples: 95436800 | consumed tokens: 195454566400 | elapsed time per iteration (s): 0.68 | learning rate: 4.072E-05 | global batch size: 256 | lm loss: 2.418087E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.777 | TFLOPs: 22.79 | 31: iteration 372900/ 476837 | consumed samples: 95462400 | consumed tokens: 195506995200 | elapsed time per iteration (s): 0.68 | learning rate: 4.069E-05 | global batch size: 256 | lm loss: 2.415598E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.621 | TFLOPs: 22.78 | 31: iteration 373000/ 476837 | consumed samples: 95488000 | consumed tokens: 195559424000 | elapsed time per iteration (s): 0.68 | learning rate: 4.065E-05 | global batch size: 256 | lm loss: 2.416221E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.539 | TFLOPs: 22.78 | 31: iteration 373100/ 476837 | consumed samples: 95513600 | consumed tokens: 195611852800 | elapsed time per iteration (s): 0.69 | learning rate: 4.061E-05 | global batch size: 256 | lm loss: 2.415635E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.220 | TFLOPs: 22.58 | 31: iteration 373200/ 476837 | consumed samples: 95539200 | consumed tokens: 195664281600 | elapsed time per iteration (s): 0.68 | learning rate: 4.057E-05 | global batch size: 256 | lm loss: 2.409846E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.882 | TFLOPs: 22.74 | 31: iteration 373300/ 476837 | consumed samples: 95564800 | consumed tokens: 195716710400 | elapsed time per iteration (s): 0.68 | learning rate: 4.053E-05 | global batch size: 256 | lm loss: 2.414972E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.628 | TFLOPs: 22.79 | 31: iteration 373400/ 476837 | consumed samples: 95590400 | consumed tokens: 195769139200 | elapsed time per iteration (s): 0.68 | learning rate: 4.049E-05 | global batch size: 256 | lm loss: 2.417549E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.431 | TFLOPs: 22.77 | 31: iteration 373500/ 476837 | consumed samples: 95616000 | consumed tokens: 195821568000 | elapsed time per iteration (s): 0.68 | learning rate: 4.046E-05 | global batch size: 256 | lm loss: 2.414031E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.485 | TFLOPs: 22.78 | 31: iteration 373600/ 476837 | consumed samples: 95641600 | consumed tokens: 195873996800 | elapsed time per iteration (s): 0.68 | learning rate: 4.042E-05 | global batch size: 256 | lm loss: 2.415763E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.746 | TFLOPs: 22.73 | 31: iteration 373700/ 476837 | consumed samples: 95667200 | consumed tokens: 195926425600 | elapsed time per iteration (s): 0.68 | learning rate: 4.038E-05 | global batch size: 256 | lm loss: 2.421692E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.257 | TFLOPs: 22.76 | 31: iteration 373800/ 476837 | consumed samples: 95692800 | consumed tokens: 195978854400 | elapsed time per iteration (s): 0.68 | learning rate: 4.034E-05 | global batch size: 256 | lm loss: 2.413896E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.240 | TFLOPs: 22.76 | 31: iteration 373900/ 476837 | consumed samples: 95718400 | consumed tokens: 196031283200 | elapsed time per iteration (s): 0.68 | learning rate: 4.030E-05 | global batch size: 256 | lm loss: 2.411642E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.020 | TFLOPs: 22.75 | 0: [2023-04-28 20:51:57,624] [INFO] [logging.py:68:log_dist] [Rank 0] step=374000, skipped=0, lr=[4.026675844695113e-05, 4.026675844695113e-05, 4.026675844695113e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 374000/ 476837 | consumed samples: 95744000 | consumed tokens: 196083712000 | elapsed time per iteration (s): 0.68 | learning rate: 4.027E-05 | global batch size: 256 | lm loss: 2.418258E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.395 | TFLOPs: 22.77 | 0: steps: 374000 loss: 2.4279 iter time (s): 0.678 samples/sec: 377.688 31: iteration 374100/ 476837 | consumed samples: 95769600 | consumed tokens: 196136140800 | elapsed time per iteration (s): 0.68 | learning rate: 4.023E-05 | global batch size: 256 | lm loss: 2.411282E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.371 | TFLOPs: 22.77 | 31: iteration 374200/ 476837 | consumed samples: 95795200 | consumed tokens: 196188569600 | elapsed time per iteration (s): 0.68 | learning rate: 4.019E-05 | global batch size: 256 | lm loss: 2.416803E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.181 | TFLOPs: 22.76 | 31: iteration 374300/ 476837 | consumed samples: 95820800 | consumed tokens: 196240998400 | elapsed time per iteration (s): 0.68 | learning rate: 4.015E-05 | global batch size: 256 | lm loss: 2.414071E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.691 | TFLOPs: 22.79 | 31: iteration 374400/ 476837 | consumed samples: 95846400 | consumed tokens: 196293427200 | elapsed time per iteration (s): 0.68 | learning rate: 4.012E-05 | global batch size: 256 | lm loss: 2.415740E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.788 | TFLOPs: 22.79 | 31: iteration 374500/ 476837 | consumed samples: 95872000 | consumed tokens: 196345856000 | elapsed time per iteration (s): 0.68 | learning rate: 4.008E-05 | global batch size: 256 | lm loss: 2.414354E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.099 | TFLOPs: 22.75 | 31: iteration 374600/ 476837 | consumed samples: 95897600 | consumed tokens: 196398284800 | elapsed time per iteration (s): 0.69 | learning rate: 4.004E-05 | global batch size: 256 | lm loss: 2.418749E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.945 | TFLOPs: 22.56 | 31: iteration 374700/ 476837 | consumed samples: 95923200 | consumed tokens: 196450713600 | elapsed time per iteration (s): 0.68 | learning rate: 4.000E-05 | global batch size: 256 | lm loss: 2.413558E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.652 | TFLOPs: 22.67 | 31: iteration 374800/ 476837 | consumed samples: 95948800 | consumed tokens: 196503142400 | elapsed time per iteration (s): 0.79 | learning rate: 3.996E-05 | global batch size: 256 | lm loss: 2.414537E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 323.856 | TFLOPs: 19.59 | 31: iteration 374900/ 476837 | consumed samples: 95974400 | consumed tokens: 196555571200 | elapsed time per iteration (s): 0.90 | learning rate: 3.993E-05 | global batch size: 256 | lm loss: 2.416256E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 285.492 | TFLOPs: 17.27 | 31: iteration 375000/ 476837 | consumed samples: 96000000 | consumed tokens: 196608000000 | elapsed time per iteration (s): 0.68 | learning rate: 3.989E-05 | global batch size: 256 | lm loss: 2.411627E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.842 | TFLOPs: 22.62 | 31: iteration 375100/ 476837 | consumed samples: 96025600 | consumed tokens: 196660428800 | elapsed time per iteration (s): 0.68 | learning rate: 3.985E-05 | global batch size: 256 | lm loss: 2.419084E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.769 | TFLOPs: 22.67 | 31: iteration 375200/ 476837 | consumed samples: 96051200 | consumed tokens: 196712857600 | elapsed time per iteration (s): 0.68 | learning rate: 3.981E-05 | global batch size: 256 | lm loss: 2.417321E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.531 | TFLOPs: 22.72 | 31: iteration 375300/ 476837 | consumed samples: 96076800 | consumed tokens: 196765286400 | elapsed time per iteration (s): 0.68 | learning rate: 3.978E-05 | global batch size: 256 | lm loss: 2.410263E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.790 | TFLOPs: 22.73 | 31: iteration 375400/ 476837 | consumed samples: 96102400 | consumed tokens: 196817715200 | elapsed time per iteration (s): 0.68 | learning rate: 3.974E-05 | global batch size: 256 | lm loss: 2.413572E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.662 | TFLOPs: 22.79 | 31: iteration 375500/ 476837 | consumed samples: 96128000 | consumed tokens: 196870144000 | elapsed time per iteration (s): 0.68 | learning rate: 3.970E-05 | global batch size: 256 | lm loss: 2.415261E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.592 | TFLOPs: 22.78 | 31: iteration 375600/ 476837 | consumed samples: 96153600 | consumed tokens: 196922572800 | elapsed time per iteration (s): 0.68 | learning rate: 3.966E-05 | global batch size: 256 | lm loss: 2.411492E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.695 | TFLOPs: 22.73 | 31: iteration 375700/ 476837 | consumed samples: 96179200 | consumed tokens: 196975001600 | elapsed time per iteration (s): 0.68 | learning rate: 3.963E-05 | global batch size: 256 | lm loss: 2.415239E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.855 | TFLOPs: 22.68 | 31: iteration 375800/ 476837 | consumed samples: 96204800 | consumed tokens: 197027430400 | elapsed time per iteration (s): 0.68 | learning rate: 3.959E-05 | global batch size: 256 | lm loss: 2.415033E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.483 | TFLOPs: 22.78 | 31: iteration 375900/ 476837 | consumed samples: 96230400 | consumed tokens: 197079859200 | elapsed time per iteration (s): 0.68 | learning rate: 3.955E-05 | global batch size: 256 | lm loss: 2.413834E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.580 | TFLOPs: 22.78 | 0: [2023-04-28 21:15:12,908] [INFO] [logging.py:68:log_dist] [Rank 0] step=376000, skipped=0, lr=[3.951566369015434e-05, 3.951566369015434e-05, 3.951566369015434e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 376000/ 476837 | consumed samples: 96256000 | consumed tokens: 197132288000 | elapsed time per iteration (s): 0.68 | learning rate: 3.952E-05 | global batch size: 256 | lm loss: 2.420125E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.826 | TFLOPs: 22.74 | 0: steps: 376000 loss: 2.4453 iter time (s): 0.694 samples/sec: 368.670 31: iteration 376100/ 476837 | consumed samples: 96281600 | consumed tokens: 197184716800 | elapsed time per iteration (s): 0.68 | learning rate: 3.948E-05 | global batch size: 256 | lm loss: 2.417951E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.493 | TFLOPs: 22.78 | 31: iteration 376200/ 476837 | consumed samples: 96307200 | consumed tokens: 197237145600 | elapsed time per iteration (s): 0.68 | learning rate: 3.944E-05 | global batch size: 256 | lm loss: 2.414154E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.233 | TFLOPs: 22.76 | 31: iteration 376300/ 476837 | consumed samples: 96332800 | consumed tokens: 197289574400 | elapsed time per iteration (s): 0.68 | learning rate: 3.940E-05 | global batch size: 256 | lm loss: 2.408024E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.522 | TFLOPs: 22.78 | 31: iteration 376400/ 476837 | consumed samples: 96358400 | consumed tokens: 197342003200 | elapsed time per iteration (s): 0.68 | learning rate: 3.937E-05 | global batch size: 256 | lm loss: 2.411583E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.608 | TFLOPs: 22.78 | 31: iteration 376500/ 476837 | consumed samples: 96384000 | consumed tokens: 197394432000 | elapsed time per iteration (s): 0.68 | learning rate: 3.933E-05 | global batch size: 256 | lm loss: 2.414534E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.341 | TFLOPs: 22.77 | 31: iteration 376600/ 476837 | consumed samples: 96409600 | consumed tokens: 197446860800 | elapsed time per iteration (s): 0.68 | learning rate: 3.929E-05 | global batch size: 256 | lm loss: 2.413713E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.206 | TFLOPs: 22.76 | 31: iteration 376700/ 476837 | consumed samples: 96435200 | consumed tokens: 197499289600 | elapsed time per iteration (s): 0.68 | learning rate: 3.926E-05 | global batch size: 256 | lm loss: 2.416461E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.206 | TFLOPs: 22.76 | 31: iteration 376800/ 476837 | consumed samples: 96460800 | consumed tokens: 197551718400 | elapsed time per iteration (s): 0.68 | learning rate: 3.922E-05 | global batch size: 256 | lm loss: 2.412097E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.000 | TFLOPs: 22.75 | 31: iteration 376900/ 476837 | consumed samples: 96486400 | consumed tokens: 197604147200 | elapsed time per iteration (s): 0.68 | learning rate: 3.918E-05 | global batch size: 256 | lm loss: 2.414314E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.767 | TFLOPs: 22.79 | 31: iteration 377000/ 476837 | consumed samples: 96512000 | consumed tokens: 197656576000 | elapsed time per iteration (s): 0.68 | learning rate: 3.914E-05 | global batch size: 256 | lm loss: 2.410245E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.526 | TFLOPs: 22.72 | 31: iteration 377100/ 476837 | consumed samples: 96537600 | consumed tokens: 197709004800 | elapsed time per iteration (s): 0.68 | learning rate: 3.911E-05 | global batch size: 256 | lm loss: 2.417926E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.981 | TFLOPs: 22.75 | 31: iteration 377200/ 476837 | consumed samples: 96563200 | consumed tokens: 197761433600 | elapsed time per iteration (s): 0.68 | learning rate: 3.907E-05 | global batch size: 256 | lm loss: 2.413716E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.536 | TFLOPs: 22.72 | 31: iteration 377300/ 476837 | consumed samples: 96588800 | consumed tokens: 197813862400 | elapsed time per iteration (s): 0.68 | learning rate: 3.903E-05 | global batch size: 256 | lm loss: 2.415851E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.821 | TFLOPs: 22.74 | 31: iteration 377400/ 476837 | consumed samples: 96614400 | consumed tokens: 197866291200 | elapsed time per iteration (s): 0.68 | learning rate: 3.900E-05 | global batch size: 256 | lm loss: 2.414126E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.398 | TFLOPs: 22.71 | 31: iteration 377500/ 476837 | consumed samples: 96640000 | consumed tokens: 197918720000 | elapsed time per iteration (s): 0.68 | learning rate: 3.896E-05 | global batch size: 256 | lm loss: 2.410812E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.823 | TFLOPs: 22.74 | 31: iteration 377600/ 476837 | consumed samples: 96665600 | consumed tokens: 197971148800 | elapsed time per iteration (s): 0.68 | learning rate: 3.892E-05 | global batch size: 256 | lm loss: 2.414586E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.772 | TFLOPs: 22.67 | 31: iteration 377700/ 476837 | consumed samples: 96691200 | consumed tokens: 198023577600 | elapsed time per iteration (s): 0.68 | learning rate: 3.889E-05 | global batch size: 256 | lm loss: 2.417245E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.581 | TFLOPs: 22.72 | 31: iteration 377800/ 476837 | consumed samples: 96716800 | consumed tokens: 198076006400 | elapsed time per iteration (s): 0.68 | learning rate: 3.885E-05 | global batch size: 256 | lm loss: 2.416546E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.996 | TFLOPs: 22.75 | 31: iteration 377900/ 476837 | consumed samples: 96742400 | consumed tokens: 198128435200 | elapsed time per iteration (s): 0.68 | learning rate: 3.881E-05 | global batch size: 256 | lm loss: 2.412991E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.459 | TFLOPs: 22.65 | 0: [2023-04-28 21:37:54,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=378000, skipped=0, lr=[3.877705527933047e-05, 3.877705527933047e-05, 3.877705527933047e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 378000/ 476837 | consumed samples: 96768000 | consumed tokens: 198180864000 | elapsed time per iteration (s): 0.68 | learning rate: 3.878E-05 | global batch size: 256 | lm loss: 2.410985E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.305 | TFLOPs: 22.77 | 0: steps: 378000 loss: 2.3584 iter time (s): 0.677 samples/sec: 377.883 31: iteration 378100/ 476837 | consumed samples: 96793600 | consumed tokens: 198233292800 | elapsed time per iteration (s): 0.68 | learning rate: 3.874E-05 | global batch size: 256 | lm loss: 2.411363E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.948 | TFLOPs: 22.68 | 31: iteration 378200/ 476837 | consumed samples: 96819200 | consumed tokens: 198285721600 | elapsed time per iteration (s): 0.68 | learning rate: 3.870E-05 | global batch size: 256 | lm loss: 2.412923E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.598 | TFLOPs: 22.78 | 31: iteration 378300/ 476837 | consumed samples: 96844800 | consumed tokens: 198338150400 | elapsed time per iteration (s): 0.68 | learning rate: 3.867E-05 | global batch size: 256 | lm loss: 2.414499E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.789 | TFLOPs: 22.73 | 31: iteration 378400/ 476837 | consumed samples: 96870400 | consumed tokens: 198390579200 | elapsed time per iteration (s): 0.68 | learning rate: 3.863E-05 | global batch size: 256 | lm loss: 2.406983E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.482 | TFLOPs: 22.78 | 31: iteration 378500/ 476837 | consumed samples: 96896000 | consumed tokens: 198443008000 | elapsed time per iteration (s): 0.69 | learning rate: 3.859E-05 | global batch size: 256 | lm loss: 2.412822E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.143 | TFLOPs: 22.51 | 31: iteration 378600/ 476837 | consumed samples: 96921600 | consumed tokens: 198495436800 | elapsed time per iteration (s): 1.03 | learning rate: 3.856E-05 | global batch size: 256 | lm loss: 2.410096E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 248.739 | TFLOPs: 15.05 | 31: iteration 378700/ 476837 | consumed samples: 96947200 | consumed tokens: 198547865600 | elapsed time per iteration (s): 0.71 | learning rate: 3.852E-05 | global batch size: 256 | lm loss: 2.406981E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.607 | TFLOPs: 21.76 | 31: iteration 378800/ 476837 | consumed samples: 96972800 | consumed tokens: 198600294400 | elapsed time per iteration (s): 0.69 | learning rate: 3.849E-05 | global batch size: 256 | lm loss: 2.410663E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.216 | TFLOPs: 22.58 | 31: iteration 378900/ 476837 | consumed samples: 96998400 | consumed tokens: 198652723200 | elapsed time per iteration (s): 0.68 | learning rate: 3.845E-05 | global batch size: 256 | lm loss: 2.409869E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.639 | TFLOPs: 22.79 | 31: iteration 379000/ 476837 | consumed samples: 97024000 | consumed tokens: 198705152000 | elapsed time per iteration (s): 0.68 | learning rate: 3.841E-05 | global batch size: 256 | lm loss: 2.410907E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.883 | TFLOPs: 22.74 | 31: iteration 379100/ 476837 | consumed samples: 97049600 | consumed tokens: 198757580800 | elapsed time per iteration (s): 0.68 | learning rate: 3.838E-05 | global batch size: 256 | lm loss: 2.414974E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.819 | TFLOPs: 22.80 | 31: iteration 379200/ 476837 | consumed samples: 97075200 | consumed tokens: 198810009600 | elapsed time per iteration (s): 0.68 | learning rate: 3.834E-05 | global batch size: 256 | lm loss: 2.412988E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.591 | TFLOPs: 22.72 | 31: iteration 379300/ 476837 | consumed samples: 97100800 | consumed tokens: 198862438400 | elapsed time per iteration (s): 0.68 | learning rate: 3.830E-05 | global batch size: 256 | lm loss: 2.411200E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.376 | TFLOPs: 22.77 | 31: iteration 379400/ 476837 | consumed samples: 97126400 | consumed tokens: 198914867200 | elapsed time per iteration (s): 0.68 | learning rate: 3.827E-05 | global batch size: 256 | lm loss: 2.415492E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.611 | TFLOPs: 22.78 | 31: iteration 379500/ 476837 | consumed samples: 97152000 | consumed tokens: 198967296000 | elapsed time per iteration (s): 0.68 | learning rate: 3.823E-05 | global batch size: 256 | lm loss: 2.414950E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.937 | TFLOPs: 22.74 | 31: iteration 379600/ 476837 | consumed samples: 97177600 | consumed tokens: 199019724800 | elapsed time per iteration (s): 0.68 | learning rate: 3.820E-05 | global batch size: 256 | lm loss: 2.417423E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.646 | TFLOPs: 22.79 | 31: iteration 379700/ 476837 | consumed samples: 97203200 | consumed tokens: 199072153600 | elapsed time per iteration (s): 0.68 | learning rate: 3.816E-05 | global batch size: 256 | lm loss: 2.410453E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.618 | TFLOPs: 22.78 | 31: iteration 379800/ 476837 | consumed samples: 97228800 | consumed tokens: 199124582400 | elapsed time per iteration (s): 0.68 | learning rate: 3.812E-05 | global batch size: 256 | lm loss: 2.412042E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.905 | TFLOPs: 22.74 | 31: iteration 379900/ 476837 | consumed samples: 97254400 | consumed tokens: 199177011200 | elapsed time per iteration (s): 0.68 | learning rate: 3.809E-05 | global batch size: 256 | lm loss: 2.407900E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.569 | TFLOPs: 22.78 | 0: [2023-04-28 22:01:14,992] [INFO] [logging.py:68:log_dist] [Rank 0] step=380000, skipped=0, lr=[3.805106405943947e-05, 3.805106405943947e-05, 3.805106405943947e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 380000/ 476837 | consumed samples: 97280000 | consumed tokens: 199229440000 | elapsed time per iteration (s): 0.68 | learning rate: 3.805E-05 | global batch size: 256 | lm loss: 2.409020E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.654 | TFLOPs: 22.79 | 0: steps: 380000 loss: 2.4104 iter time (s): 0.697 samples/sec: 367.472 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 380000 | lm loss value: 2.942323E+00 | lm loss PPL: 1.895984E+01 | 31: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 380000 to checkpoints_1b1250b1b5 0: [2023-04-28 22:01:15,277] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step380000 is begin to save! 0: [2023-04-28 22:01:15,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_01-model_00-model_states.pt... 0: [2023-04-28 22:01:15,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_01-model_00-model_states.pt. 0: [2023-04-28 22:01:15,517] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_03-model_00-model_states.pt... 0: [2023-04-28 22:01:15,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_03-model_00-model_states.pt. 0: [2023-04-28 22:01:15,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_04-model_00-model_states.pt... 0: [2023-04-28 22:01:15,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_04-model_00-model_states.pt. 0: [2023-04-28 22:01:15,714] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_05-model_00-model_states.pt... 0: [2023-04-28 22:01:15,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_05-model_00-model_states.pt. 0: [2023-04-28 22:01:15,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_06-model_00-model_states.pt... 0: [2023-04-28 22:01:15,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_06-model_00-model_states.pt. 0: [2023-04-28 22:01:15,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_07-model_00-model_states.pt... 0: [2023-04-28 22:01:15,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_07-model_00-model_states.pt. 0: [2023-04-28 22:01:15,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_08-model_00-model_states.pt... 0: [2023-04-28 22:01:16,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_08-model_00-model_states.pt. 0: [2023-04-28 22:01:16,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_09-model_00-model_states.pt... 0: [2023-04-28 22:01:16,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_09-model_00-model_states.pt. 0: [2023-04-28 22:01:16,163] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_10-model_00-model_states.pt... 0: [2023-04-28 22:01:16,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_10-model_00-model_states.pt. 0: [2023-04-28 22:01:16,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_11-model_00-model_states.pt... 0: [2023-04-28 22:01:16,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_11-model_00-model_states.pt. 0: [2023-04-28 22:01:16,350] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_12-model_00-model_states.pt... 0: [2023-04-28 22:01:16,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_12-model_00-model_states.pt. 0: [2023-04-28 22:01:16,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_13-model_00-model_states.pt... 0: [2023-04-28 22:01:16,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_13-model_00-model_states.pt. 0: [2023-04-28 22:01:16,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_14-model_00-model_states.pt... 0: [2023-04-28 22:01:16,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_14-model_00-model_states.pt. 0: [2023-04-28 22:01:16,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_15-model_00-model_states.pt... 0: [2023-04-28 22:01:16,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_15-model_00-model_states.pt. 0: [2023-04-28 22:01:16,740] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_16-model_00-model_states.pt... 0: [2023-04-28 22:01:16,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_16-model_00-model_states.pt. 0: [2023-04-28 22:01:16,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_17-model_00-model_states.pt... 0: [2023-04-28 22:01:16,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_17-model_00-model_states.pt. 0: [2023-04-28 22:01:16,927] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_18-model_00-model_states.pt... 0: [2023-04-28 22:01:17,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_18-model_00-model_states.pt. 0: [2023-04-28 22:01:17,023] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_19-model_00-model_states.pt... 0: [2023-04-28 22:01:17,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_19-model_00-model_states.pt. 0: [2023-04-28 22:01:17,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_20-model_00-model_states.pt... 0: [2023-04-28 22:01:17,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_20-model_00-model_states.pt. 0: [2023-04-28 22:01:17,203] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_21-model_00-model_states.pt... 0: [2023-04-28 22:01:17,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_21-model_00-model_states.pt. 0: [2023-04-28 22:01:17,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_22-model_00-model_states.pt... 0: [2023-04-28 22:01:17,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_22-model_00-model_states.pt. 0: [2023-04-28 22:01:17,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_23-model_00-model_states.pt... 0: [2023-04-28 22:01:17,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_23-model_00-model_states.pt. 0: [2023-04-28 22:01:17,482] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_24-model_00-model_states.pt... 0: [2023-04-28 22:01:17,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_24-model_00-model_states.pt. 0: [2023-04-28 22:01:17,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_25-model_00-model_states.pt... 0: [2023-04-28 22:01:17,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_25-model_00-model_states.pt. 0: [2023-04-28 22:01:17,661] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_26-model_00-model_states.pt... 0: [2023-04-28 22:01:17,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_26-model_00-model_states.pt. 0: [2023-04-28 22:01:17,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_27-model_00-model_states.pt... 0: [2023-04-28 22:01:17,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_27-model_00-model_states.pt. 0: [2023-04-28 22:01:17,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_28-model_00-model_states.pt... 0: [2023-04-28 22:01:17,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_28-model_00-model_states.pt. 0: [2023-04-28 22:01:17,933] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/layer_30-model_00-model_states.pt... 0: [2023-04-28 22:01:17,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/layer_30-model_00-model_states.pt. 0: [2023-04-28 22:01:17,938] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_1b1250b1b5/global_step380000/mp_rank_00_model_states.pt 0: [2023-04-28 22:01:17,938] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/mp_rank_00_model_states.pt... 0: [2023-04-28 22:01:17,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/mp_rank_00_model_states.pt. 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt... 16: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 8: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt... 15: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt... 20: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt... 19: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt... 24: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt... 25: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt... 31: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 11: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt... 10: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt... 9: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt... 12: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt... 13: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt... 18: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt... 29: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt... 28: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt... 26: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt... 30: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt... 22: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 14: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt... 17: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt... 21: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt... 6: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 3: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 27: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt... 23: [2023-04-28 22:01:18,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt... 0: [2023-04-28 22:01:18,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,072] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,072] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_171_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_117_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_249_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_193_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_199_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_170_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_118_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_175_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_174_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_173_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_192_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_114_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_172_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_82_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_87_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_85_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_81_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_86_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_84_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_80_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt. 10: [2023-04-28 22:01:18,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_83_mp_rank_00_optim_states.pt 10: [2023-04-28 22:01:18,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_112_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_255_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_116_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_197_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_102_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_101_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_99_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_100_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_96_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_98_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_103_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 12: [2023-04-28 22:01:18,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt. 12: [2023-04-28 22:01:18,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_97_mp_rank_00_optim_states.pt 12: [2023-04-28 22:01:18,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_198_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_195_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_252_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_251_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_250_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_253_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 7: [2023-04-28 22:01:18,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-04-28 22:01:18,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-04-28 22:01:18,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_115_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_113_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 14: [2023-04-28 22:01:18,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt. 14: [2023-04-28 22:01:18,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_119_mp_rank_00_optim_states.pt 14: [2023-04-28 22:01:18,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_168_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 21: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt. 21: [2023-04-28 22:01:18,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_169_mp_rank_00_optim_states.pt 21: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_226_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_225_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_231_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_224_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_228_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_229_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_88_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt. 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_91_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_95_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_94_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_92_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_89_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_93_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_90_mp_rank_00_optim_states.pt 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 11: [2023-04-28 22:01:18,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_196_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 24: [2023-04-28 22:01:18,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt. 24: [2023-04-28 22:01:18,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_194_mp_rank_00_optim_states.pt 24: [2023-04-28 22:01:18,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-04-28 22:01:18,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt. 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_159_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_154_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_155_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_156_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_157_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_153_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_158_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_152_mp_rank_00_optim_states.pt 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 19: [2023-04-28 22:01:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_248_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 2: [2023-04-28 22:01:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt. 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_181_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_177_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_178_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_183_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_180_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_179_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_176_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_182_mp_rank_00_optim_states.pt 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 22: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_110_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_111_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_108_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_107_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_105_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_109_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 6: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_189_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_187_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_184_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt. 13: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt. 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_188_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_191_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_190_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_186_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_185_mp_rank_00_optim_states.pt 13: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_104_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_106_mp_rank_00_optim_states.pt 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 23: [2023-04-28 22:01:18,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 13: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt. 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_247_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_243_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_246_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_242_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_240_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_245_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_241_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_244_mp_rank_00_optim_states.pt 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 30: [2023-04-28 22:01:18,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_217_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_223_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_219_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_216_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_221_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_218_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_222_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt. 27: [2023-04-28 22:01:18,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_220_mp_rank_00_optim_states.pt 27: [2023-04-28 22:01:18,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_227_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 1: [2023-04-28 22:01:18,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-04-28 22:01:18,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-04-28 22:01:18,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt. 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_162_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_167_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_163_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_161_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_164_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_165_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_166_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_160_mp_rank_00_optim_states.pt 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 20: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt. 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_239_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_234_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_238_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_233_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_235_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_236_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_232_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_237_mp_rank_00_optim_states.pt 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 29: [2023-04-28 22:01:18,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 31: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt. 31: [2023-04-28 22:01:18,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_254_mp_rank_00_optim_states.pt 31: [2023-04-28 22:01:18,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: [2023-04-28 22:01:18,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-04-28 22:01:18,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_78_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_73_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_133_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt. 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_134_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_131_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_135_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_130_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_132_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_128_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_129_mp_rank_00_optim_states.pt 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 16: [2023-04-28 22:01:18,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_79_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_72_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_74_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_75_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_77_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 9: [2023-04-28 22:01:18,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt. 9: [2023-04-28 22:01:18,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_76_mp_rank_00_optim_states.pt 9: [2023-04-28 22:01:18,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 28: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt. 28: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_230_mp_rank_00_optim_states.pt 28: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_124_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_125_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_123_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_126_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_127_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_120_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt. 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_150_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_151_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_146_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_147_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_149_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_122_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_148_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_144_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_145_mp_rank_00_optim_states.pt 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 15: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 18: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 5: [2023-04-28 22:01:18,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_69_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_65_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_66_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_67_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_70_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_64_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_71_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 15: [2023-04-28 22:01:18,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_121_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-04-28 22:01:18,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 15: [2023-04-28 22:01:18,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 4: [2023-04-28 22:01:18,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 3: [2023-04-28 22:01:18,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 8: [2023-04-28 22:01:18,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt. 8: [2023-04-28 22:01:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_68_mp_rank_00_optim_states.pt 8: [2023-04-28 22:01:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_206_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_204_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_201_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_205_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_207_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_200_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_203_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt. 25: [2023-04-28 22:01:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_202_mp_rank_00_optim_states.pt 25: [2023-04-28 22:01:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_212_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt. 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_208_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_211_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_214_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_213_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_215_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_210_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_209_mp_rank_00_optim_states.pt 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 26: [2023-04-28 22:01:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_143_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_142_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_136_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt. 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_140_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_137_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_139_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_138_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_1b1250b1b5/global_step380000/bf16_zero_pp_rank_141_mp_rank_00_optim_states.pt 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 17: [2023-04-28 22:01:18,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step380000 is ready now! 0: successfully saved checkpoint at iteration 380000 to checkpoints_1b1250b1b5 31: time (ms) | save-checkpoint: 3028.44 31: iteration 380100/ 476837 | consumed samples: 97305600 | consumed tokens: 199281868800 | elapsed time per iteration (s): 0.71 | learning rate: 3.802E-05 | global batch size: 256 | lm loss: 2.414128E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 358.812 | TFLOPs: 21.71 | 31: iteration 380200/ 476837 | consumed samples: 97331200 | consumed tokens: 199334297600 | elapsed time per iteration (s): 0.68 | learning rate: 3.798E-05 | global batch size: 256 | lm loss: 2.412324E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.134 | TFLOPs: 22.76 | 31: iteration 380300/ 476837 | consumed samples: 97356800 | consumed tokens: 199386726400 | elapsed time per iteration (s): 0.68 | learning rate: 3.794E-05 | global batch size: 256 | lm loss: 2.408645E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.832 | TFLOPs: 22.80 | 31: iteration 380400/ 476837 | consumed samples: 97382400 | consumed tokens: 199439155200 | elapsed time per iteration (s): 0.68 | learning rate: 3.791E-05 | global batch size: 256 | lm loss: 2.412881E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.489 | TFLOPs: 22.78 | 31: iteration 380500/ 476837 | consumed samples: 97408000 | consumed tokens: 199491584000 | elapsed time per iteration (s): 0.68 | learning rate: 3.787E-05 | global batch size: 256 | lm loss: 2.416956E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.331 | TFLOPs: 22.65 | 31: iteration 380600/ 476837 | consumed samples: 97433600 | consumed tokens: 199544012800 | elapsed time per iteration (s): 0.68 | learning rate: 3.784E-05 | global batch size: 256 | lm loss: 2.408329E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.248 | TFLOPs: 22.76 | 31: iteration 380700/ 476837 | consumed samples: 97459200 | consumed tokens: 199596441600 | elapsed time per iteration (s): 0.68 | learning rate: 3.780E-05 | global batch size: 256 | lm loss: 2.410970E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.063 | TFLOPs: 22.75 | 31: iteration 380800/ 476837 | consumed samples: 97484800 | consumed tokens: 199648870400 | elapsed time per iteration (s): 0.87 | learning rate: 3.776E-05 | global batch size: 256 | lm loss: 2.413381E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 294.159 | TFLOPs: 17.80 | 31: iteration 380900/ 476837 | consumed samples: 97510400 | consumed tokens: 199701299200 | elapsed time per iteration (s): 0.70 | learning rate: 3.773E-05 | global batch size: 256 | lm loss: 2.414347E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 367.516 | TFLOPs: 22.23 | 31: iteration 381000/ 476837 | consumed samples: 97536000 | consumed tokens: 199753728000 | elapsed time per iteration (s): 0.68 | learning rate: 3.769E-05 | global batch size: 256 | lm loss: 2.411696E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.680 | TFLOPs: 22.79 | 31: iteration 381100/ 476837 | consumed samples: 97561600 | consumed tokens: 199806156800 | elapsed time per iteration (s): 0.68 | learning rate: 3.766E-05 | global batch size: 256 | lm loss: 2.410770E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.829 | TFLOPs: 22.80 | 31: iteration 381200/ 476837 | consumed samples: 97587200 | consumed tokens: 199858585600 | elapsed time per iteration (s): 0.68 | learning rate: 3.762E-05 | global batch size: 256 | lm loss: 2.413908E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.660 | TFLOPs: 22.79 | 31: iteration 381300/ 476837 | consumed samples: 97612800 | consumed tokens: 199911014400 | elapsed time per iteration (s): 0.68 | learning rate: 3.759E-05 | global batch size: 256 | lm loss: 2.413492E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.258 | TFLOPs: 22.70 | 31: iteration 381400/ 476837 | consumed samples: 97638400 | consumed tokens: 199963443200 | elapsed time per iteration (s): 0.68 | learning rate: 3.755E-05 | global batch size: 256 | lm loss: 2.411630E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.545 | TFLOPs: 22.78 | 31: iteration 381500/ 476837 | consumed samples: 97664000 | consumed tokens: 200015872000 | elapsed time per iteration (s): 0.68 | learning rate: 3.751E-05 | global batch size: 256 | lm loss: 2.407749E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.283 | TFLOPs: 22.76 | 31: iteration 381600/ 476837 | consumed samples: 97689600 | consumed tokens: 200068300800 | elapsed time per iteration (s): 0.68 | learning rate: 3.748E-05 | global batch size: 256 | lm loss: 2.410593E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.515 | TFLOPs: 22.78 | 31: iteration 381700/ 476837 | consumed samples: 97715200 | consumed tokens: 200120729600 | elapsed time per iteration (s): 0.68 | learning rate: 3.744E-05 | global batch size: 256 | lm loss: 2.407018E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.309 | TFLOPs: 22.77 | 31: iteration 381800/ 476837 | consumed samples: 97740800 | consumed tokens: 200173158400 | elapsed time per iteration (s): 0.68 | learning rate: 3.741E-05 | global batch size: 256 | lm loss: 2.407698E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.788 | TFLOPs: 22.79 | 31: iteration 381900/ 476837 | consumed samples: 97766400 | consumed tokens: 200225587200 | elapsed time per iteration (s): 0.68 | learning rate: 3.737E-05 | global batch size: 256 | lm loss: 2.409587E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.826 | TFLOPs: 22.80 | 0: [2023-04-28 22:24:19,476] [INFO] [logging.py:68:log_dist] [Rank 0] step=382000, skipped=0, lr=[3.733781864029757e-05, 3.733781864029757e-05, 3.733781864029757e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 382000/ 476837 | consumed samples: 97792000 | consumed tokens: 200278016000 | elapsed time per iteration (s): 0.68 | learning rate: 3.734E-05 | global batch size: 256 | lm loss: 2.412657E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.728 | TFLOPs: 22.79 | 0: steps: 382000 loss: 2.3719 iter time (s): 0.688 samples/sec: 372.029 31: iteration 382100/ 476837 | consumed samples: 97817600 | consumed tokens: 200330444800 | elapsed time per iteration (s): 0.68 | learning rate: 3.730E-05 | global batch size: 256 | lm loss: 2.411487E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.802 | TFLOPs: 22.80 | 31: iteration 382200/ 476837 | consumed samples: 97843200 | consumed tokens: 200382873600 | elapsed time per iteration (s): 0.68 | learning rate: 3.727E-05 | global batch size: 256 | lm loss: 2.414852E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.286 | TFLOPs: 22.76 | 31: iteration 382300/ 476837 | consumed samples: 97868800 | consumed tokens: 200435302400 | elapsed time per iteration (s): 0.77 | learning rate: 3.723E-05 | global batch size: 256 | lm loss: 2.407572E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 334.484 | TFLOPs: 20.24 | 31: iteration 382400/ 476837 | consumed samples: 97894400 | consumed tokens: 200487731200 | elapsed time per iteration (s): 0.91 | learning rate: 3.720E-05 | global batch size: 256 | lm loss: 2.411297E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 280.276 | TFLOPs: 16.96 | 31: iteration 382500/ 476837 | consumed samples: 97920000 | consumed tokens: 200540160000 | elapsed time per iteration (s): 0.68 | learning rate: 3.716E-05 | global batch size: 256 | lm loss: 2.412552E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.427 | TFLOPs: 22.65 | 31: iteration 382600/ 476837 | consumed samples: 97945600 | consumed tokens: 200592588800 | elapsed time per iteration (s): 0.68 | learning rate: 3.713E-05 | global batch size: 256 | lm loss: 2.409923E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.371 | TFLOPs: 22.77 | 31: iteration 382700/ 476837 | consumed samples: 97971200 | consumed tokens: 200645017600 | elapsed time per iteration (s): 0.68 | learning rate: 3.709E-05 | global batch size: 256 | lm loss: 2.407852E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.843 | TFLOPs: 22.80 | 31: iteration 382800/ 476837 | consumed samples: 97996800 | consumed tokens: 200697446400 | elapsed time per iteration (s): 0.68 | learning rate: 3.706E-05 | global batch size: 256 | lm loss: 2.406965E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.064 | TFLOPs: 22.69 | 31: iteration 382900/ 476837 | consumed samples: 98022400 | consumed tokens: 200749875200 | elapsed time per iteration (s): 0.68 | learning rate: 3.702E-05 | global batch size: 256 | lm loss: 2.413610E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.304 | TFLOPs: 22.77 | 31: iteration 383000/ 476837 | consumed samples: 98048000 | consumed tokens: 200802304000 | elapsed time per iteration (s): 0.68 | learning rate: 3.699E-05 | global batch size: 256 | lm loss: 2.408670E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.632 | TFLOPs: 22.72 | 31: iteration 383100/ 476837 | consumed samples: 98073600 | consumed tokens: 200854732800 | elapsed time per iteration (s): 0.68 | learning rate: 3.695E-05 | global batch size: 256 | lm loss: 2.405039E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.947 | TFLOPs: 22.74 | 31: iteration 383200/ 476837 | consumed samples: 98099200 | consumed tokens: 200907161600 | elapsed time per iteration (s): 0.68 | learning rate: 3.692E-05 | global batch size: 256 | lm loss: 2.412034E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.536 | TFLOPs: 22.66 | 31: iteration 383300/ 476837 | consumed samples: 98124800 | consumed tokens: 200959590400 | elapsed time per iteration (s): 0.68 | learning rate: 3.688E-05 | global batch size: 256 | lm loss: 2.406640E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.219 | TFLOPs: 22.70 | 31: iteration 383400/ 476837 | consumed samples: 98150400 | consumed tokens: 201012019200 | elapsed time per iteration (s): 0.68 | learning rate: 3.685E-05 | global batch size: 256 | lm loss: 2.405789E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.814 | TFLOPs: 22.74 | 31: iteration 383500/ 476837 | consumed samples: 98176000 | consumed tokens: 201064448000 | elapsed time per iteration (s): 0.68 | learning rate: 3.681E-05 | global batch size: 256 | lm loss: 2.407386E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.530 | TFLOPs: 22.66 | 31: iteration 383600/ 476837 | consumed samples: 98201600 | consumed tokens: 201116876800 | elapsed time per iteration (s): 0.68 | learning rate: 3.678E-05 | global batch size: 256 | lm loss: 2.404967E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.657 | TFLOPs: 22.79 | 31: iteration 383700/ 476837 | consumed samples: 98227200 | consumed tokens: 201169305600 | elapsed time per iteration (s): 0.68 | learning rate: 3.674E-05 | global batch size: 256 | lm loss: 2.408388E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.142 | TFLOPs: 22.76 | 31: iteration 383800/ 476837 | consumed samples: 98252800 | consumed tokens: 201221734400 | elapsed time per iteration (s): 0.68 | learning rate: 3.671E-05 | global batch size: 256 | lm loss: 2.411967E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.568 | TFLOPs: 22.78 | 31: iteration 383900/ 476837 | consumed samples: 98278400 | consumed tokens: 201274163200 | elapsed time per iteration (s): 0.68 | learning rate: 3.667E-05 | global batch size: 256 | lm loss: 2.410712E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.110 | TFLOPs: 22.75 | 0: [2023-04-28 22:47:33,328] [INFO] [logging.py:68:log_dist] [Rank 0] step=384000, skipped=0, lr=[3.6637445373794216e-05, 3.6637445373794216e-05, 3.6637445373794216e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 384000/ 476837 | consumed samples: 98304000 | consumed tokens: 201326592000 | elapsed time per iteration (s): 0.68 | learning rate: 3.664E-05 | global batch size: 256 | lm loss: 2.408664E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.311 | TFLOPs: 22.77 | 0: steps: 384000 loss: 2.3900 iter time (s): 0.694 samples/sec: 368.656 31: iteration 384100/ 476837 | consumed samples: 98329600 | consumed tokens: 201379020800 | elapsed time per iteration (s): 0.68 | learning rate: 3.660E-05 | global batch size: 256 | lm loss: 2.410866E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.337 | TFLOPs: 22.77 | 31: iteration 384200/ 476837 | consumed samples: 98355200 | consumed tokens: 201431449600 | elapsed time per iteration (s): 0.68 | learning rate: 3.657E-05 | global batch size: 256 | lm loss: 2.404954E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.397 | TFLOPs: 22.77 | 31: iteration 384300/ 476837 | consumed samples: 98380800 | consumed tokens: 201483878400 | elapsed time per iteration (s): 0.68 | learning rate: 3.653E-05 | global batch size: 256 | lm loss: 2.406465E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.474 | TFLOPs: 22.78 | 31: iteration 384400/ 476837 | consumed samples: 98406400 | consumed tokens: 201536307200 | elapsed time per iteration (s): 0.68 | learning rate: 3.650E-05 | global batch size: 256 | lm loss: 2.411609E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.030 | TFLOPs: 22.69 | 31: iteration 384500/ 476837 | consumed samples: 98432000 | consumed tokens: 201588736000 | elapsed time per iteration (s): 0.68 | learning rate: 3.646E-05 | global batch size: 256 | lm loss: 2.407298E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.752 | TFLOPs: 22.79 | 31: iteration 384600/ 476837 | consumed samples: 98457600 | consumed tokens: 201641164800 | elapsed time per iteration (s): 0.68 | learning rate: 3.643E-05 | global batch size: 256 | lm loss: 2.408739E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.753 | TFLOPs: 22.79 | 31: iteration 384700/ 476837 | consumed samples: 98483200 | consumed tokens: 201693593600 | elapsed time per iteration (s): 0.68 | learning rate: 3.640E-05 | global batch size: 256 | lm loss: 2.412944E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.535 | TFLOPs: 22.78 | 31: iteration 384800/ 476837 | consumed samples: 98508800 | consumed tokens: 201746022400 | elapsed time per iteration (s): 0.68 | learning rate: 3.636E-05 | global batch size: 256 | lm loss: 2.411234E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.586 | TFLOPs: 22.78 | 31: iteration 384900/ 476837 | consumed samples: 98534400 | consumed tokens: 201798451200 | elapsed time per iteration (s): 0.68 | learning rate: 3.633E-05 | global batch size: 256 | lm loss: 2.410758E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.146 | TFLOPs: 22.76 | 31: iteration 385000/ 476837 | consumed samples: 98560000 | consumed tokens: 201850880000 | elapsed time per iteration (s): 0.68 | learning rate: 3.629E-05 | global batch size: 256 | lm loss: 2.405364E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.435 | TFLOPs: 22.77 | 31: iteration 385100/ 476837 | consumed samples: 98585600 | consumed tokens: 201903308800 | elapsed time per iteration (s): 0.68 | learning rate: 3.626E-05 | global batch size: 256 | lm loss: 2.405156E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.768 | TFLOPs: 22.79 | 31: iteration 385200/ 476837 | consumed samples: 98611200 | consumed tokens: 201955737600 | elapsed time per iteration (s): 0.68 | learning rate: 3.622E-05 | global batch size: 256 | lm loss: 2.407690E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.343 | TFLOPs: 22.77 | 31: iteration 385300/ 476837 | consumed samples: 98636800 | consumed tokens: 202008166400 | elapsed time per iteration (s): 0.68 | learning rate: 3.619E-05 | global batch size: 256 | lm loss: 2.404859E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.373 | TFLOPs: 22.77 | 31: iteration 385400/ 476837 | consumed samples: 98662400 | consumed tokens: 202060595200 | elapsed time per iteration (s): 0.68 | learning rate: 3.615E-05 | global batch size: 256 | lm loss: 2.404673E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.335 | TFLOPs: 22.77 | 31: iteration 385500/ 476837 | consumed samples: 98688000 | consumed tokens: 202113024000 | elapsed time per iteration (s): 0.68 | learning rate: 3.612E-05 | global batch size: 256 | lm loss: 2.410277E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.532 | TFLOPs: 22.78 | 31: iteration 385600/ 476837 | consumed samples: 98713600 | consumed tokens: 202165452800 | elapsed time per iteration (s): 0.68 | learning rate: 3.609E-05 | global batch size: 256 | lm loss: 2.411638E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.802 | TFLOPs: 22.80 | 31: iteration 385700/ 476837 | consumed samples: 98739200 | consumed tokens: 202217881600 | elapsed time per iteration (s): 0.68 | learning rate: 3.605E-05 | global batch size: 256 | lm loss: 2.405593E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.382 | TFLOPs: 22.77 | 31: iteration 385800/ 476837 | consumed samples: 98764800 | consumed tokens: 202270310400 | elapsed time per iteration (s): 0.68 | learning rate: 3.602E-05 | global batch size: 256 | lm loss: 2.407744E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.266 | TFLOPs: 22.70 | 31: iteration 385900/ 476837 | consumed samples: 98790400 | consumed tokens: 202322739200 | elapsed time per iteration (s): 0.68 | learning rate: 3.598E-05 | global batch size: 256 | lm loss: 2.408136E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.243 | TFLOPs: 22.76 | 0: [2023-04-28 23:10:13,732] [INFO] [logging.py:68:log_dist] [Rank 0] step=386000, skipped=0, lr=[3.5950068331508617e-05, 3.5950068331508617e-05, 3.5950068331508617e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 386000/ 476837 | consumed samples: 98816000 | consumed tokens: 202375168000 | elapsed time per iteration (s): 0.68 | learning rate: 3.595E-05 | global batch size: 256 | lm loss: 2.401870E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.696 | TFLOPs: 22.79 | 0: steps: 386000 loss: 2.3689 iter time (s): 0.678 samples/sec: 377.845 31: iteration 386100/ 476837 | consumed samples: 98841600 | consumed tokens: 202427596800 | elapsed time per iteration (s): 0.73 | learning rate: 3.592E-05 | global batch size: 256 | lm loss: 2.409046E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 350.911 | TFLOPs: 21.23 | 31: iteration 386200/ 476837 | consumed samples: 98867200 | consumed tokens: 202480025600 | elapsed time per iteration (s): 0.98 | learning rate: 3.588E-05 | global batch size: 256 | lm loss: 2.409925E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 262.304 | TFLOPs: 15.87 | 31: iteration 386300/ 476837 | consumed samples: 98892800 | consumed tokens: 202532454400 | elapsed time per iteration (s): 0.70 | learning rate: 3.585E-05 | global batch size: 256 | lm loss: 2.409737E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 363.976 | TFLOPs: 22.02 | 31: iteration 386400/ 476837 | consumed samples: 98918400 | consumed tokens: 202584883200 | elapsed time per iteration (s): 0.68 | learning rate: 3.581E-05 | global batch size: 256 | lm loss: 2.407455E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.810 | TFLOPs: 22.68 | 31: iteration 386500/ 476837 | consumed samples: 98944000 | consumed tokens: 202637312000 | elapsed time per iteration (s): 0.68 | learning rate: 3.578E-05 | global batch size: 256 | lm loss: 2.404212E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.978 | TFLOPs: 22.75 | 31: iteration 386600/ 476837 | consumed samples: 98969600 | consumed tokens: 202689740800 | elapsed time per iteration (s): 0.68 | learning rate: 3.575E-05 | global batch size: 256 | lm loss: 2.407543E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.312 | TFLOPs: 22.77 | 31: iteration 386700/ 476837 | consumed samples: 98995200 | consumed tokens: 202742169600 | elapsed time per iteration (s): 0.68 | learning rate: 3.571E-05 | global batch size: 256 | lm loss: 2.405350E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.571 | TFLOPs: 22.78 | 31: iteration 386800/ 476837 | consumed samples: 99020800 | consumed tokens: 202794598400 | elapsed time per iteration (s): 0.68 | learning rate: 3.568E-05 | global batch size: 256 | lm loss: 2.407609E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.722 | TFLOPs: 22.79 | 31: iteration 386900/ 476837 | consumed samples: 99046400 | consumed tokens: 202847027200 | elapsed time per iteration (s): 0.68 | learning rate: 3.565E-05 | global batch size: 256 | lm loss: 2.401824E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.038 | TFLOPs: 22.75 | 31: iteration 387000/ 476837 | consumed samples: 99072000 | consumed tokens: 202899456000 | elapsed time per iteration (s): 0.68 | learning rate: 3.561E-05 | global batch size: 256 | lm loss: 2.405267E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.559 | TFLOPs: 22.72 | 31: iteration 387100/ 476837 | consumed samples: 99097600 | consumed tokens: 202951884800 | elapsed time per iteration (s): 0.68 | learning rate: 3.558E-05 | global batch size: 256 | lm loss: 2.410500E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.338 | TFLOPs: 22.77 | 31: iteration 387200/ 476837 | consumed samples: 99123200 | consumed tokens: 203004313600 | elapsed time per iteration (s): 0.68 | learning rate: 3.554E-05 | global batch size: 256 | lm loss: 2.405784E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.903 | TFLOPs: 22.80 | 31: iteration 387300/ 476837 | consumed samples: 99148800 | consumed tokens: 203056742400 | elapsed time per iteration (s): 0.68 | learning rate: 3.551E-05 | global batch size: 256 | lm loss: 2.403971E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.003 | TFLOPs: 22.75 | 31: iteration 387400/ 476837 | consumed samples: 99174400 | consumed tokens: 203109171200 | elapsed time per iteration (s): 0.68 | learning rate: 3.548E-05 | global batch size: 256 | lm loss: 2.407089E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.680 | TFLOPs: 22.79 | 31: iteration 387500/ 476837 | consumed samples: 99200000 | consumed tokens: 203161600000 | elapsed time per iteration (s): 0.68 | learning rate: 3.544E-05 | global batch size: 256 | lm loss: 2.403063E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.649 | TFLOPs: 22.79 | 31: iteration 387600/ 476837 | consumed samples: 99225600 | consumed tokens: 203214028800 | elapsed time per iteration (s): 0.68 | learning rate: 3.541E-05 | global batch size: 256 | lm loss: 2.410331E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.746 | TFLOPs: 22.79 | 31: iteration 387700/ 476837 | consumed samples: 99251200 | consumed tokens: 203266457600 | elapsed time per iteration (s): 0.68 | learning rate: 3.538E-05 | global batch size: 256 | lm loss: 2.401900E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.764 | TFLOPs: 22.79 | 31: iteration 387800/ 476837 | consumed samples: 99276800 | consumed tokens: 203318886400 | elapsed time per iteration (s): 0.68 | learning rate: 3.534E-05 | global batch size: 256 | lm loss: 2.407204E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.006 | TFLOPs: 22.75 | 31: iteration 387900/ 476837 | consumed samples: 99302400 | consumed tokens: 203371315200 | elapsed time per iteration (s): 0.68 | learning rate: 3.531E-05 | global batch size: 256 | lm loss: 2.406234E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.755 | TFLOPs: 22.79 | 0: [2023-04-28 23:33:31,466] [INFO] [logging.py:68:log_dist] [Rank 0] step=388000, skipped=0, lr=[3.5275809282730406e-05, 3.5275809282730406e-05, 3.5275809282730406e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 388000/ 476837 | consumed samples: 99328000 | consumed tokens: 203423744000 | elapsed time per iteration (s): 0.68 | learning rate: 3.528E-05 | global batch size: 256 | lm loss: 2.403890E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.486 | TFLOPs: 22.66 | 0: steps: 388000 loss: 2.4194 iter time (s): 0.696 samples/sec: 368.043 31: iteration 388100/ 476837 | consumed samples: 99353600 | consumed tokens: 203476172800 | elapsed time per iteration (s): 0.68 | learning rate: 3.524E-05 | global batch size: 256 | lm loss: 2.403173E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.406 | TFLOPs: 22.77 | 31: iteration 388200/ 476837 | consumed samples: 99379200 | consumed tokens: 203528601600 | elapsed time per iteration (s): 0.68 | learning rate: 3.521E-05 | global batch size: 256 | lm loss: 2.403635E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.340 | TFLOPs: 22.77 | 31: iteration 388300/ 476837 | consumed samples: 99404800 | consumed tokens: 203581030400 | elapsed time per iteration (s): 0.68 | learning rate: 3.518E-05 | global batch size: 256 | lm loss: 2.404268E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.783 | TFLOPs: 22.73 | 31: iteration 388400/ 476837 | consumed samples: 99430400 | consumed tokens: 203633459200 | elapsed time per iteration (s): 0.68 | learning rate: 3.514E-05 | global batch size: 256 | lm loss: 2.397271E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.794 | TFLOPs: 22.80 | 31: iteration 388500/ 476837 | consumed samples: 99456000 | consumed tokens: 203685888000 | elapsed time per iteration (s): 0.68 | learning rate: 3.511E-05 | global batch size: 256 | lm loss: 2.404011E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.481 | TFLOPs: 22.72 | 31: iteration 388600/ 476837 | consumed samples: 99481600 | consumed tokens: 203738316800 | elapsed time per iteration (s): 0.68 | learning rate: 3.508E-05 | global batch size: 256 | lm loss: 2.402312E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.639 | TFLOPs: 22.79 | 31: iteration 388700/ 476837 | consumed samples: 99507200 | consumed tokens: 203790745600 | elapsed time per iteration (s): 0.68 | learning rate: 3.504E-05 | global batch size: 256 | lm loss: 2.403403E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.393 | TFLOPs: 22.71 | 31: iteration 388800/ 476837 | consumed samples: 99532800 | consumed tokens: 203843174400 | elapsed time per iteration (s): 0.68 | learning rate: 3.501E-05 | global batch size: 256 | lm loss: 2.405939E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.854 | TFLOPs: 22.74 | 31: iteration 388900/ 476837 | consumed samples: 99558400 | consumed tokens: 203895603200 | elapsed time per iteration (s): 0.68 | learning rate: 3.498E-05 | global batch size: 256 | lm loss: 2.402500E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 374.714 | TFLOPs: 22.67 | 31: iteration 389000/ 476837 | consumed samples: 99584000 | consumed tokens: 203948032000 | elapsed time per iteration (s): 0.68 | learning rate: 3.494E-05 | global batch size: 256 | lm loss: 2.406354E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.831 | TFLOPs: 22.74 | 31: iteration 389100/ 476837 | consumed samples: 99609600 | consumed tokens: 204000460800 | elapsed time per iteration (s): 0.72 | learning rate: 3.491E-05 | global batch size: 256 | lm loss: 2.404653E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 357.247 | TFLOPs: 21.61 | 31: iteration 389200/ 476837 | consumed samples: 99635200 | consumed tokens: 204052889600 | elapsed time per iteration (s): 0.69 | learning rate: 3.488E-05 | global batch size: 256 | lm loss: 2.408411E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.067 | TFLOPs: 22.51 | 31: iteration 389300/ 476837 | consumed samples: 99660800 | consumed tokens: 204105318400 | elapsed time per iteration (s): 0.68 | learning rate: 3.484E-05 | global batch size: 256 | lm loss: 2.407938E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.170 | TFLOPs: 22.76 | 31: iteration 389400/ 476837 | consumed samples: 99686400 | consumed tokens: 204157747200 | elapsed time per iteration (s): 0.68 | learning rate: 3.481E-05 | global batch size: 256 | lm loss: 2.402218E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.002 | TFLOPs: 22.75 | 31: iteration 389500/ 476837 | consumed samples: 99712000 | consumed tokens: 204210176000 | elapsed time per iteration (s): 0.68 | learning rate: 3.478E-05 | global batch size: 256 | lm loss: 2.404432E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.535 | TFLOPs: 22.78 | 31: iteration 389600/ 476837 | consumed samples: 99737600 | consumed tokens: 204262604800 | elapsed time per iteration (s): 0.68 | learning rate: 3.475E-05 | global batch size: 256 | lm loss: 2.404395E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.763 | TFLOPs: 22.73 | 31: iteration 389700/ 476837 | consumed samples: 99763200 | consumed tokens: 204315033600 | elapsed time per iteration (s): 0.68 | learning rate: 3.471E-05 | global batch size: 256 | lm loss: 2.402872E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.683 | TFLOPs: 22.79 | 31: iteration 389800/ 476837 | consumed samples: 99788800 | consumed tokens: 204367462400 | elapsed time per iteration (s): 0.68 | learning rate: 3.468E-05 | global batch size: 256 | lm loss: 2.402224E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.570 | TFLOPs: 22.78 | 31: iteration 389900/ 476837 | consumed samples: 99814400 | consumed tokens: 204419891200 | elapsed time per iteration (s): 0.69 | learning rate: 3.465E-05 | global batch size: 256 | lm loss: 2.407076E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 372.976 | TFLOPs: 22.56 | 0: [2023-04-28 23:56:37,954] [INFO] [logging.py:68:log_dist] [Rank 0] step=390000, skipped=0, lr=[3.46147876728882e-05, 3.46147876728882e-05, 3.46147876728882e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 31: iteration 390000/ 476837 | consumed samples: 99840000 | consumed tokens: 204472320000 | elapsed time per iteration (s): 0.88 | learning rate: 3.461E-05 | global batch size: 256 | lm loss: 2.402847E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 290.262 | TFLOPs: 17.56 | 0: steps: 390000 loss: 2.3901 iter time (s): 0.690 samples/sec: 371.012 31: ------------------------------------------------------------------------------------------------- 31: validation loss at iteration 390000 | lm loss value: 3.001098E+00 | lm loss PPL: 2.010761E+01 | 31: ------------------------------------------------------------------------------------------------- 31: iteration 390100/ 476837 | consumed samples: 99865600 | consumed tokens: 204524748800 | elapsed time per iteration (s): 0.84 | learning rate: 3.458E-05 | global batch size: 256 | lm loss: 2.402491E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 303.979 | TFLOPs: 18.39 | 31: iteration 390200/ 476837 | consumed samples: 99891200 | consumed tokens: 204577177600 | elapsed time per iteration (s): 0.68 | learning rate: 3.455E-05 | global batch size: 256 | lm loss: 2.406071E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 376.559 | TFLOPs: 22.78 |